2 Chauvenet’s Criterion

One of a number of methods of identifying outliers in a dataset is Chauvenet’s criterion. The example provided on the Wikipedia page gives the following values:

# Wikipedia example
# https://en.wikipedia.org/wiki/Chauvenet%27s_criterion#Example
observations = [9, 10, 10, 10, 11, 50]

The calculation can be implemented in Python as follows:

# Sample size
n = len(observations)
# Probability represented by one tail of the normal distribution
P_z = 1 - (1 / (4 * n))

# Maximum allowable deviation
D_max = st.norm.ppf(P_z)

# Mean
x_bar = np.mean(observations)
# Sample standard deviation
s = np.std(observations, ddof=1)
# z-score of 50
z = (50 - x_bar) / s

print(f'P_z = {P_z:.4f}, D_max = {D_max:.4f}, z = {z:.2f}')

## P_z = 0.9583, D_max = 1.7317, z = 2.04

…and, for the entire dataset:

# z-scores
z_scores = (observations - x_bar) / s

# Test z-scores against the maximum allowable deviation
reject = abs(z_scores) > D_max
obs = np.array(observations)
kept = obs[~reject]
rejected = obs[reject]

print(kept, rejected)

## [ 9 10 10 10 11] [50]

Here’s the above wrapped up into a function:

def chauvenets_criterion(obs):
    """
    Identify and remove outliers using Chauvenet's criterion.

    From Wikipedia (https://en.wikipedia.org/wiki/Chauvenet%27s_criterion):
    "In statistical theory, Chauvenet's criterion (named for William Chauvenet)
    is a means of assessing whether one piece of experimental data — an outlier
    — from a set of observations, is likely to be spurious."   

    Parameters
    ----------
    obs : array-like
        Set of observations.

    Returns
    -------
    kept : numpy.ndarray
        The observations that have not been rejected.
    rejected : numpy.ndarray
        The observations that have been rejected.

    Examples
    --------
    This example comes from Wikipedia:
    https://en.wikipedia.org/wiki/Chauvenet%27s_criterion#Example

    >>> kept, rejected = chauvenets_criterion([9, 10, 10, 10, 11, 50])
    >>> print(kept)
    [ 9 10 10 10 11]
    >>> print(rejected)
    [50]
    """
    # Sample size
    n = len(obs)
    # Probability represented by one tail of the normal distribution
    P_z = 1 - (1 / (4 * n))

    # Maximum allowable deviation
    D_max = st.norm.ppf(P_z)

    # Mean
    x_bar = np.mean(obs)
    # Sample standard deviation
    s = np.std(obs, ddof=1)
    # z-scores
    z_scores = (obs - x_bar) / s

    # Test z-scores against the maximum allowable deviation
    reject = abs(z_scores) > D_max
    obs = np.array(obs)
    kept = obs[~reject]
    rejected = obs[reject]

    return kept, rejected


# Wikipedia example
# https://en.wikipedia.org/wiki/Chauvenet%27s_criterion#Example
observations = [9, 10, 10, 10, 11, 50]
kept, rejected = chauvenets_criterion(observations)

print(kept, rejected)

## [ 9 10 10 10 11] [50]

⇦ Back

Statistics in Python:
Outlier Rejection

1 Python Packages

2 Chauvenet’s Criterion

Statistics in Python:Outlier Rejection

1 Python Packages

2 Chauvenet’s Criterion

Statistics in Python:
Outlier Rejection