spkit.stats.outliers

spkit.stats.outliers(x, method='iqr', k=1.5, include_lower=True, include_upper=True, return_thr=False)

Statistical Outliers

This function computes lower and upper limits beyond which all the point are assumed to be outliers

IQR - Interquartile Range

\[ \begin{align}\begin{aligned}l_t = Q3 + k \times (Q3-Q1)\\u_t = Q3 - k \times (Q3-Q1)\end{aligned}\end{align} \]

where \(k=1.5\) and Q1 is first quartile, Q3 and 3rd Quartile

Standard Deviation

\[ \begin{align}\begin{aligned}l_t = k \times SD(x)\\u_t = - k \times SD(x)\end{aligned}\end{align} \]

where \(k=1.5\) and \(SD(\cdot)\) is Standard Deviation

Parameters:
x: 1d array or list
  • if x included NaNs, they are excluded

method: str {‘iqr’,’sd’}
  • method to compute lower/upper limits according to above equations

k: scalar, default k=1.5
  • used as per eqaution

include_lower: bool, default=True
  • if False, lower threshold is excluded

include_upper: bool, default=True
  • if False, upper threshold is excluded

return_thr: bool,default = False
  • if True, lower and upper thresholds are returnes

Returns:
x_outlr: outliers indentified from x
idxindices
  • indices of the outliers in x, after removing Nans

  • indices are for xi, xi = x[~np.isnan(x)]

idx_bin: bool array
  • indices of outliers

(lt,ut)tupple,
  • lower and upper limit

  • returns only if return_thr is True

Examples

#sp.stats.outliers
import numpy as np
import matplotlib.pyplot as plt
import spkit as sp

np.random.seed(1)
x = np.random.randn(1000)
t = np.arange(len(x))
np.random.seed(None)

x_outlr1, idx1, _, (lt1,ut1) = sp.stats.outliers(x,method='iqr',return_thr=True)
x_outlr2, idx2, _, (lt2,ut2) = sp.stats.outliers(x,method='sd',k=2,return_thr=True)

plt.figure(figsize=(12,4))
plt.subplot(121)
plt.plot(t,x,'o',color='C0',alpha=0.8)
plt.plot(t[idx1],x[idx1],'o',color='C3')
plt.axhline(lt1,color='k',ls='--',lw=1)
plt.axhline(ut1,color='k',ls='--',lw=1)
plt.title('Outliers using  IQR')
plt.ylabel('x')
plt.subplot(122)
plt.plot(t,x,'o',color='C0',alpha=0.8)
plt.plot(t[idx2],x[idx2],'o',color='C3')
plt.axhline(lt2,color='k',ls='--',lw=1)
plt.axhline(ut2,color='k',ls='--',lw=1)
plt.ylabel('x')
plt.title('Outliers using  SD')
plt.show()
../../_images/spkit-stats-outliers-1.png