1 Packages
2 Worked Example
- 2.1 One-Sided Binomial Test
  - 2.1.1 Simulation
  - 2.1.2 Exact Calculation
- 2.2 Two-Sided Binomial Test
  - 2.2.1 Simulation
  - 2.2.2 Exact Calculation
3 Non-Equivalence with the Chi-Squared and Z-Tests
- 3.1 Which to Use?

Some events have only two possible outcomes: a coin toss can result in either a head or a tail. Others can be simplified into having only two possible outcomes: rolling a die has six possible outcomes but we can simplify the result by saying that it is either a six or not a six. For these types of events - with binary outcomes - we can use the binomial test to establish if they are truly random or not to within a certain level of significance.

1 Packages

The code on this page uses the numpy, scipy and statsmodels packages. These can be installed from the terminal with:

$ python3.11 -m pip install numpy
$ python3.11 -m pip install scipy
$ python3.11 -m pip install statsmodels

where python3.11 corresponds to the version of Python you have installed and are using.

Import these packages into Python with:

import numpy as np
from scipy import stats as st
from statsmodels.stats import proportion

2 Worked Example

This example comes from the Wikipedia page:

If we roll a die 235 times and 6 comes up 51 of those times, is the die fair?

We would expect a 6 to come up 1/6th of the time, ie \(235 / 6 = 39.17\) times in 235 rolls. So it certainly seems as if the die might be biased towards rolling a 6 more often than expected.

In more statistically-rigorous language, the number of successes we have observed is 51 out of a total of 235 trials where the hypothesized probability of success is 1/6. These numbers are usually known as k, n and p respectively:

#
# Set up the example
#
# The number of successes
k = 51
# The number of trials
n = 235
# The hypothesized probability of success
p = 1 / 6

2.1 One-Sided Binomial Test

Under the null hypothesis that the die is indeed fair and thus follows a binomial distribution with \(n=235\) and \(p=1/6\) with regards to landing on 6, we can test whether the die in the example produces a greater than expected number of 6s:

\(H_0 : \pi = \pi_0\)
\(H_1 : \pi \geq \pi_0\)

with true proportion \(\pi\) and expected proportion \(\pi_0 = 1/6\).

2.1.1 Simulation

Let’s do this first by simulating many (specifically, 10,000) batches of 235 rolls:

# Set the seed so that we get the same random numbers each time this code runs
np.random.seed(20230814)

# Count the number of times we get a more extreme (or equally extreme) number
# of 6s than the number we actually observed
more_extreme = 0
# Iterate through 10,000 simulations
i = 0
while i < 10000:
    # Simulate rolling a die n times with the result either being a 6 (True)
    # or not (False) with probability p
    outcomes = np.random.choice([True, False], size=n, p=[p, 1 - p])
    # Count the number of 6s
    successes = np.sum(outcomes)
    # If the number of 6s is more or equally as extreme as observed, count it
    if successes >= k:
        more_extreme += 1
    # Advance the loop counter
    i += 1
# Calculate the p-value
p_value = more_extreme / 10000

print(f'p = {p_value:.4f}')

## p = 0.0262

This is close to the true value result on the Wikipedia page: 0.02654.

2.1.2 Exact Calculation

SciPy has a binomtest() function which will calculate the p-value exactly:

result = st.binomtest(k, n, p, alternative='greater')

print(f'p = {result.pvalue:.5f}')

## p = 0.02654

This matches Wikipedia.

See the documentation for this function here.

2.2 Two-Sided Binomial Test

A more robust test would be to see if the die is more or less likely to roll a 6:

\(H_0 : \pi = \pi_0\)
\(H_1 : \pi \neq \pi_0\)

with true proportion \(\pi\) and expected proportion \(\pi_0 = 1/6\). Calculating the p-values is not a simple case of doubling the value from the one-sided test because the binomial distribution is not symmetric (unless \(p=0.5\)). We have to consider both tails individually:

2.2.1 Simulation

Again, let’s simulate 10,000 batches of 235 die rolls:

# Count the number of times we get a more extreme (or equally extreme) number
# of 6s than the number we actually observed
more_extreme = 0
# k is the upper-tail, so calculate the lower-tail
expected = n * p
lower_tail = (expected - (k - expected))
# Iterate through 10,000 simulations
i = 0
while i < 10000:
    # Simulate rolling a die n times with the result either being a 6 (True)
    # or not (False) with probability p
    outcomes = np.random.choice([1, 0], size=n, p=[p, 1 - p])
    # Count the number of 6s
    successes = np.sum(outcomes)
    # If the number of 6s is more or equally as extreme as observed, count it
    if successes >= k:
        more_extreme += 1
    # If the number of 6s is more or equally as extreme as the lower tail
    if successes <= lower_tail:
        more_extreme += 1
    # Advance the loop counter
    i += 1
# Calculate the p-value
p_value = more_extreme / 10000

print(f'p = {p_value:.4f}')

## p = 0.0421

This value is again close to Wikipedia’s answer, which is 0.0437.

2.2.2 Exact Calculation

result = st.binomtest(k, n, p)

print(f'p = {result.pvalue:.4f}')

## p = 0.0437

This matches Wikipedia. Using \(p = 0.05\) as our significance threshold (equivalently, our false positive rate: the probability of incorrectly finding a significant difference when there really is none) allows us to reject the null hypothesis and conclude that the die is not fair.

3 Non-Equivalence with the Chi-Squared and Z-Tests

Both the \(\chi^2\) goodness-of-fit test and the one-sample Z-test for a proportion can be used to answer this example question:

# Perform the chi-squared goodness-of-fit test
observed = [51, 235 - 51]
expected = [235 / 6, 5 * 235 / 6]
chisq, p = st.chisquare(observed, expected)

print(f'χ² = {chisq:.2f}, p = {p:.3f}')

## χ² = 4.29, p = 0.038

# Perform a one-sample Z-test for a proportion
count = 51
nobs = 235
pi_0 = 1 / 6
z, p = proportion.proportions_ztest(count, nobs, value=pi_0, prop_var=pi_0)

print(f'Z = {z:.2f}, p = {p:.3f}')

## Z = 2.07, p = 0.038

Note that both tests give the same p-value and that the \(\chi^2\) statistic is the square of the Z statistic in binomial examples:

print(z**2 == round(chisq, 14))

## True

The reason that these tests give the same result is because the \(\chi^2\) goodness-of-fit test is derived from the Z-statistic.¹ For more on \(\chi^2\) tests see the page here and for more on the one-sample Z-test for a proportion see the page here.

3.1 Which to Use?

The binomial test is an exact test and so, in general, it should be used over the \(\chi^2\) test or the Z-test which are approximations.² However, it’s worth noting that, due to the central limit theorem, as the sample size gets larger the binomial distribution starts getting more similar to the normal distribution and, hence, the results of the three tests converge³:

# Perform a binomial test with a very large sample size
k = 4000
n = 23500
p = 1 / 6
result = st.binomtest(k, n, p)

print(f'p = {result.pvalue:.3f}')

## p = 0.146

# Perform a one-sample Z-test for a proportion with a very large sample size
count = 4000
nobs = 23500
pi_0 = 1 / 6
z, p = proportion.proportions_ztest(count, nobs, value=pi_0, prop_var=pi_0)

print(f'p = {p:.3f}')

## p = 0.145

These values of 0.146 and 0.145 are very close! So, if our choice of test makes almost no difference when \(n\) is large, is there ever an instance when we would prefer the \(\chi^2\) or Z-test over the binomial? Here are some potential reasons⁴^, ⁵:

A Z-test allows us to compute a confidence interval for our sample proportion
We can easily estimate statistical power for a Z-test but not for a binomial test
A Z-test is computationally less heavy, especially for larger sample sizes

So when can we use a Z-test instead of a binomial test? A rule of thumb is that \(p \times n\) and \((1 - p) \times n\) must both be greater than 5, where \(p\) is the hypothesized probability of success and \(n\) the sample size.

⇦ Back

Wallis, S. “z-squared: The Origin and Application of χ²”. Journal of Quantitative Linguistics 2013; 20:350-378. DOI: 10.1080/09296174.2013.830554. Available here. Jump to reference: ↩︎
GraphPad Software, LLC. “The binomial test”. Statistics Guide 2023. Available here. Jump to reference: ↩︎
Soetewey, A. “One-proportion and chi-square goodness of fit test”. Stats and R 2020. Available here. Jump to reference: ↩︎
Van den Berg, RG. “Binomial Test – Simple Tutorial”. SPSS Tutorials 2023. Available here. Jump to reference: ↩︎
Hessing, T. “One and Two Sample Z Proportion Hypothesis Tests”. 6σ Study Guide. Available here. Jump to reference: ↩︎

Statistics in Python:The Binomial Test