spkit.bin_width

spkit.bin_width(x, method='fd')

Compute bin width for histogram, using different methods

Compute bin width using different methods

‘fd’ (Freedman Diaconis Estimator)
  • Robust (resilient to outliers) estimator that takes into account data variability and data size.

‘doane’
  • An improved version of Sturges’ estimator that works better with non-normal datasets.

‘scott’
  • Less robust estimator that that takes into account data variability and data size.

‘stone’
  • Estimator based on leave-one-out cross-validation estimate of the integrated squared error.

    Can be regarded as a generalization of Scott’s rule.

‘rice’
  • Estimator does not take variability into account, only data size.

    Commonly overestimates number of bins required.

‘sturges’
  • Only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.

‘sqrt’
  • Square root (of data size) estimator, used by Excel and other programs for its speed and simplicity.

Parameters:
x1d-array or (n-d array)
methodmethod to compute bin width and number of bins
Returns:
bwbin width
knumber of bins

See also

hist_plot

# Histogram plot with optimal number of bins

References

  • wikipedia

Examples

>>> import numpy as np
>>> import spkit as sp
>>> np.random.seed(1)
>>> t = np.linspace(0,2,200)
>>> x1 = np.cos(2*np.pi*1*t) + 0.01*np.random.randn(len(t))  # less noisy
>>> x2 = np.cos(2*np.pi*1*t) + 0.9*np.random.randn(len(t))  # very noisy
>>> bw1, k1 = sp.bin_width(x1, method='fd')
>>> bw2, k2 = sp.bin_width(x2, method='fd')
>>> print(r'Optimal: bin-width of x1 = ',bw1,'\t Number of bins = ',k1)
>>> print(r'Optimal: bin-width of x2 = ',bw2,'\t Number of bins = ',k2)

Examples using spkit.bin_width

Entropy - Real-Valued Source

Entropy - Real-Valued Source