The code on this page uses the NumPy, Matplotlib and SciPy packages. These can be installed from the terminal with:
$ python3.11 -m pip install numpy
$ python3.11 -m pip install matplotlib
$ python3.11 -m pip install scipy
Replace python3.11
with the version of Python being used. Once installed, these packages can be imported into a Python script via the following:
import numpy as np
from matplotlib import pyplot as plt
from scipy import stats as st
Settings specific to this tutorial:
# Set the seed so that the random numbers are the same each time
np.random.seed(20230830)
# Settings
A = 6 # Want figures to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
If designing an experiment with two groups (usually one will be an experimental group and the other a control group) that will produce numeric data then the sample sizes that will be needed for each group depend on whether the means or the proportions of the two groups will be compared:
Two groups of numeric data can be represented on a box-and-whisker plot (aka a box plot), although this shows the median - not the mean - by default. If we are interested in comparing the means then it makes sense to draw these in:
fake_data = (10 - 1) * np.random.random_sample((10, 2)) + 1
plt.boxplot(
fake_data,
labels=['Group 1', 'Group 2'],
showmeans=True, meanline=True,
medianprops={'c': 'k'},
meanprops={'c': 'grey', 'lw': 2, 'ls': (0, (1, 1))}
)
means = np.mean(fake_data, axis=0)
plt.text(0.85, means[0], r'$\mu_1$')
plt.text(1.85, means[1], r'$\mu_2$')
title = r'\begin{center}' + \
r'Representative Example of Two Groups of Numeric Data\\' + \
r'\normalsize (with means $\mu_1$ and $\mu_2$)' + \
r'\end{center}'
plt.title(title)
plt.ylabel('Measurements')
plt.show()
If the experiment is looking to establish whether or not the two groups’ means are different then the hypotheses are:
where \(\mu_1\) is the mean of Group 1 and \(\mu_2\) is the mean of group 2. When designing an experiment these values will need to be estimated. This can be done by performing a pilot test or by reviewing similar experiments in the literature.
Next, the standard deviation of the measurements needs to be considered: during the experiment the two samples will be drawn from a population and the dispersion of these measurements in the entire population, \(\sigma\), might be known.
If the true population standard deviation, \(\sigma\), is known then the minimum sample sizes, \(n_1\) and \(n_2\), that will be required for the experiments two groups are given by[1 pg 58]:
\[n_1 = \kappa n_2\] \[n_2 = \left( 1 + \dfrac{1}{\kappa} \right) \left( \sigma \dfrac{z_{1 - \alpha/2} + z_{1 - \beta}}{\mu_1 - \mu_2} \right)^2\]
where:
Here’s a worked example[2]:
# Mean of group 1, μ_1
mu_1 = 5
# Mean of group 2, μ_2
mu_2 = 10
# Sampling ratio, κ = n_1 / n_2
kappa = 1
# Population standard deviation, σ
sigma = 10
# Type I error rate, α
alpha = 0.05
# Type II error rate, β
beta = 0.20
# Sample size estimation
n_2 = (1 + 1 / kappa) * (
sigma * (
st.norm.ppf(1 - alpha / 2) + st.norm.ppf(1 - beta)
) / (mu_1 - mu_2)
)**2
n_2 = np.ceil(n_2)
print(f'Sample size = {n_2:n} for both groups')
## Sample size = 63 for both groups
To repeat: mu_1
and mu_2
are estimated from a pilot test or from the literature, sigma
is known and kappa
, alpha
and beta
are set by the experimenter.
If the population standard deviation, \(\sigma\), is not known then it can be estimated by the pooled sample standard deviation, \(s\), which is calculated as per equation 3.2.1 in the source.[1 pg 57] This, again, should come from pilot data or from the literature. The sample sizes for the experiment’s groups (\(n_1\) and \(n_2\), not to be confused with the sample sizes for the pilot study’s groups) can then be calculated by solving the following[1 pg 59]:
\[n_1 = \kappa n_2\] \[\mathcal{T}_{(1 + \kappa) n_2 - 2} \left( t_{\alpha / 2, (1 + \kappa) n_2 - 2} \Biggr\rvert \dfrac{|\epsilon|}{s} \sqrt{\dfrac{n_2}{1 + 1 / \kappa}} \right) = \beta\]
where:
An intermediate value \(\theta = \dfrac{|\epsilon|}{s}\) is also defined.
Here’s a worked example:
#
# Results from a pilot test
# (these have been manufactured so as to have a difference of means of 5 and a
# pooled standard deviation of 10)
#
# Sample 1
sample_1 = np.random.normal(25, 10, 1000)
# Sample 2
sample_2 = np.random.normal(30, 10, 1000)
#
# Values set by experimenter
#
# Sampling ratio, κ = n_1 / n_2
kappa = 1
# Type I error rate, α
alpha = 0.05
# Type II error rate, β
beta = 0.20
#
# Calculate theta
#
# Sample size of group 1
n_1 = len(sample_1)
# Sample size of group 2
n_2 = len(sample_2)
# Mean of group 1, μ_1
mu_1 = np.mean(sample_1)
# Mean of group 2, μ_2
mu_2 = np.mean(sample_2)
# Difference of means, ϵ
epsilon = mu_2 - mu_1
# Pooled sample standard deviation
s = np.sqrt(
(
np.sum((np.array(sample_1) - mu_1)**2) +
np.sum((np.array(sample_2) - mu_2)**2)
) / (n_1 + n_2 - 2)
)
theta = abs(epsilon) / s
#
# Solve for n_2
#
# Iterate over potential sample sizes for the smaller group
for n in range(1000):
# Degrees of freedom
dof = n + n - 2
# t-statistic
t = st.t.ppf(1 - (alpha / 2), dof)
# Non-centrality parameter
nc = (np.sqrt(n) * theta) / np.sqrt(1 + 1 / kappa)
# Cumulative distribution
p = st.nct.cdf(t, dof, nc)
# Test if the probability is less than or equal to the Type II error rate
if p <= beta:
break
n_2 = n
n_1 = kappa * n
print(f'The sample sizes must be {n_1} and {n_2}')
## The sample sizes must be 64 and 64
We can check that we are correct by looking at Table 3.2.1 in the source.[1 pg 60] Our worked example had \(\theta = \dfrac{|\epsilon|}{s} = 0.5\), \(\kappa = 1\) and \(1 - \beta = 80\%\). Our worked example also had \(\alpha=5\%\) but the equation on page 59 of the source has \(\alpha / 2\) in it whereas the equation that Table 3.2.1 uses has \(\alpha\) in it so we actually need to look at the \(\alpha=2.5\%\) column to account for this discrepancy. With all of that in mind we get an answer of 64 from the table, which matches the answer we calculated here.
After all that work, we can see that the answer we got when \(\sigma\) was not known (64) is almost the same as the one we got when \(\sigma\) was known (63)! So, for large sample sizes (\(n_1, n_2 > 30\) or so) it doesn’t really make a difference which method is used.
The ‘reverse’ calculation - getting the power from given sample sizes - can also be performed[2]:
# Mean of group 1, μ_1
mu_1 = 5
# Mean of group 2, μ_2
mu_2 = 10
# Sampling ratio, κ = n_1 / n_2
kappa = 1
# Population standard deviation, σ
sigma = 10
# Type I error rate, α
alpha = 0.05
# Smaller of the two groups
n_2 = 63
# Derive the other group's size
n_1 = kappa * n_2
# Power calculation
z = (mu_1 - mu_2) / (sigma * np.sqrt((1 + 1 / kappa) / n_2))
power = st.norm.cdf(z - st.norm.ppf(1 - alpha / 2)) + \
st.norm.cdf(-z - st.norm.ppf(1 - alpha / 2))
print(f'Power = {power:.2f}, beta = {1 - power:.2f} (the type II error rate)')
## Power = 0.80, beta = 0.20 (the type II error rate)
A variant of the comparison of two samples’ means is to perform two one-sided tests (TOST). This is more usually used to establish equivalence (or the lack thereof) rather than the traditional test for equality. For a TOST procedure there are two sets of hypotheses:
and
where \(\mu_1\) and \(\mu_2\) are the means of the two samples. The formulas for the required sample sizes are[3]:
\[n_1 = \left( s_1^2 + s_2^2 / \kappa \right) \left( \dfrac{z_{1 - \alpha} + z_{1 - \beta}}{\mu_1 - \mu_2} \right)^2\] \[n_2 = \kappa n_1\]
where
As per the previous example, for large values of \(n_1\) and \(n_2\) the difference between using sample standard deviations and population standard deviations is inconsequential. As such, sample standard deviations have been used in the formula above (strictly, these should be population standard deviations).
A worked example:
# Mean of group 1, μ_1
mu_1 = 132.86
# Mean of group 2, μ_2
mu_2 = 127.44
# Sample standard deviation of group 1
s_1 = 15.34
# Sample standard deviation of group 2
s_2 = 18.23
# Sampling ratio, κ = n_1 / n_2
kappa = 2
# Type I error rate, α
alpha = 0.05
# Type II error rate, β
beta = 0.2
n_1 = (s_1**2 + s_2**2 / kappa) * \
(
(st.norm.ppf(1 - alpha) + st.norm.ppf(1 - beta)) /
(mu_1 - mu_2)
)**2
n_1 = np.ceil(n_1)
n_2 = kappa * n_1
print(f'The sample sizes must be {n_1} and {n_2}')
## The sample sizes must be 85.0 and 170.0
Chow, SC, Shao, J, Wang, H. Sample Size Calculations in Clinical Research. 2nd Ed. Chapman & Hall/CRC; 2008. DOI: 10.1201/9781584889830. Available here. Jump to reference: ↩︎
HyLown Consulting LLC. Calculate Sample Size Needed to Compare 2 Means: 2-Sample, 2-Sided Equality. Available here. Jump to reference: ↩︎
HyLown Consulting LLC. Calculate Sample Size Needed to Compare 2 Means: 2-Sample, 1-Sided. Available here. Jump to reference: ↩︎