⇦ Back

In theory, if you measure the same thing twice you should get the same reading both times. In reality, however, all measurements will show some variability if repeated. The term repeatability refers to:

The closeness of agreement between measurements when measured under the same conditions[1, 2, 3]

The repeatability coefficient (RC) is a number that, if you make two measurements of the same thing under the same conditions, the difference between those two measurements will be less than the RC in 95% of cases.[4, 5] The smaller the repeatability coefficient is, the better.

When we say that the repeated measurements need to be done under “the same conditions”, what we mean is that they need to be done with the same:

and be done using the same units and over a short period of time.[3, 6]

1 Quantifying Repeatability

As mentioned, the repeatability coefficient is the value under which the difference between any two repeat measurements of the same measurand acquired under repeatability conditions (aka identical conditions) should fall with 95% probability.[5] If we assume Normality (ie if multiple measurements of the same thing will produce values that are Normally distributed around its true value) then this can be mathematically represented as:

\(RC = 1.96 \times \sqrt{2\sigma_w^2} = 2.77\sigma_w\)

where \(\sigma_w\) is the within-subject standard deviation - the standard deviation of the values you will get if you measure the same thing multiple times. In practice, it is not always feasible to attempt to calculate the within-subject standard deviation exactly (as it might require too many repeats, which can be expensive) and it isn’t always possible either (ie it might be a function instead of a single number) so it is usually approximated as the average (mean) sample standard deviation of multiple sets of repeated measurements. When approximating in this way, we use the symbol \(s_w\) to differentiate from the true value of the within-subject standard deviation (which is always \(\sigma_w\)).

In the special case where the measurements are performed exactly twice on each measurand the above equation can be re-written as follows:

\(RC = 1.96 \times \sqrt{\frac{\Sigma\left(m_2 - m_1\right)^2}{n}}\)

where \(m_1\) and \(m_2\) are the two measurements performed on each of the \(n\) measurands.[7, 8] Although this equation looks more complicated than the first it is actually less computationally intense. The reason this simplification works is because it just so happens that the variance of two observations is equal to half the square of their difference.[9]

2 Repeatability vs Reproducibility

Reproducibility is the closeness of agreement between measurements when measured under different conditions.[2, 3, 6]

Like repeatability, it involves the same measurement being performed multiple times except now with different locations, operators or measuring systems (aka under reproducibility conditions).

Similar to the repeatability coefficient, a reproducibility coefficient (RDC) can be used. The equation for this is \(RDC = 1.96 \times \sqrt{2\sigma_w^2 + \nu^2}\) which is exactly the same as RC with the addition of the \(\nu^2\) term (which is the variability attributed to the differing conditions). The value of \(\nu^2\) depends on what variables change between measurements.[5]

3 Calculating the Repeatability Coefficient

3.1 Using the True Standard Deviation \(\sigma_w\)

Here’s an example that ostensibly uses the true value of the within-subject standard deviation. In reality, however, it’s impossible to known the true value unless you have an infinite number of measurements, so this is still just an estimate:

# Fake data: 10 hypothetical measurements of some measurand
measurements <- c(13.63, 6.82, 10.76, 11.03, 11.49, 11.31, 7.33, 13.30, 9.77, 8.79)
# Within-subject sample standard deviation
sigma_w <- sd(measurements)
# Repeatability coefficient
rc <- 1.96 * sqrt(2) * sigma_w
print(rc)
## [1] 6.308008

The 10 numbers used in the above example were generated by selecting from a Normal distribution with mean 10 and standard deviation 2. The true value of \(\sigma_w\) for this example was thus 2 and the ‘correct’ value of RC was therefore \(2.77\times 2 = 5.54\).

Notice that in the above example it was the sample standard deviation that was calculated, not the population standard deviation.

3.2 Using the Estimated Standard Deviation \(s_w\)

In this example we only have two repeats of the measurements as opposed to the 10 we had in the first example. To make up for it, however, we have performed these repeats on 7 different measurands:

# Fake data: two hypothetical measurements of seven different measurands
repeat_1 <- c(2.73, 2.01)
repeat_2 <- c(1.93, 9.10)
repeat_3 <- c(5.47, 7.36)
repeat_4 <- c(11.71, 8.26)
repeat_5 <- c(10.44, 9.05)
repeat_6 <- c(10.66, 9.97)
repeat_7 <- c(12.66, 10.01)
# Within-subject sample variances
var_1 <- var(repeat_1)
var_2 <- var(repeat_2)
var_3 <- var(repeat_3)
var_4 <- var(repeat_4)
var_5 <- var(repeat_5)
var_6 <- var(repeat_6)
var_7 <- var(repeat_7)
# Mean within-subject sample variance
var_w <- mean(c(var_1, var_2, var_3, var_4, var_5, var_6, var_7))
# Within-subject sample standard deviation
s_w <- sqrt(var_w)
# Coefficient of repeatability
rc <- 1.96 * sqrt(2) * s_w
print(rc)
## [1] 6.493515

The numbers used in this example were again taken from Normal distributions, this time with means of 5 through to 11 and all with standard deviations of 2. So the ‘correct’ answer for RC was again \(2.77\times 2 = 5.54\).

Note that on this occasion we couldn’t calculate \(s_w\) immediately because you can’t meaningfully take the average of a list of standard deviations. We first had to calculate the sample variances and take the average of those, before square rooting it to get \(s_w\).

3.3 Bland-Altman (1986)

Most journal papers will cite one of Bland & Altman’s papers when they are performing repeatability analyses. Because these papers are so important in this area we will use them for the following examples. This next one uses their 1986 paper[7] and the data below comes from page 2 of that. This example also uses the data frame object type in R which makes it easier to perform calculations on tabular data:

# Raw data
wright_large <- data.frame(
    `First Measurement` = c(
        494, 395, 516, 434, 476, 557, 413, 442, 650, 433, 417, 656, 267, 478, 178, 423, 427
    ),
    `Second Measurement` = c(
        490, 397, 512, 401, 470, 611, 415, 431, 638, 429, 420, 633, 275, 492, 165, 372, 421
    ), check.names = FALSE
)
# Within-subject sample variances
var_w <- apply(wright_large[c('First Measurement', 'Second Measurement')], 1, var)
# Mean within-subject sample variance
var_w <- mean(var_w)
# Within-subject sample standard deviation
s_w <- sqrt(var_w)
# Coefficient of repeatability
rc <- 1.96 * sqrt(2) * s_w
print(rc)
## [1] 42.42792

The second example from the same paper:

# Raw data
wright_mini <- data.frame(
    `First Measurement` = c(
        512, 430, 520, 428, 500, 600, 364, 380, 658, 445, 432, 626, 260, 477, 259, 350, 451
    ),
    `Second Measurement` = c(
        525, 415, 508, 444, 500, 625, 460, 390, 642, 432, 420, 605, 227, 467, 268, 370, 443
    ), check.names = FALSE
)
# Within-subject sample variances
var_w <- apply(wright_mini[c('First Measurement', 'Second Measurement')], 1, var)
# Mean within-subject sample variance
var_w <- mean(var_w)
# Within-subject sample standard deviation
s_w <- sqrt(var_w)
# Coefficient of repeatability
rc <- 1.96 * sqrt(2) * s_w
print(rc)
## [1] 55.19001

Note that Bland & Altman actually rounded off a bit when they did these calculations and also used a distance of 2 standard deviations from the mean instead of the more accurate 1.96. As a result, their answers are 43.2 l/min for the large meter and 56.4 l/min for the mini meter whereas ours are 42.4 and 55.2, respectively. Additionally, they used the simplified version of the calculation that only works when exactly two repeats have been performed, as mentioned above. For completeness, here is their method:

# Measurement differences
diffs <- wright_large[['First Measurement']] - wright_large[['Second Measurement']]
# Squared differences
sq_diffs <- diffs**2
# Mean squared difference (aka within-subject variance)
var_w <- mean(sq_diffs)
# Within-subject sample standard deviation
s_w <- sqrt(var_w)
# Coefficient of repeatability
rc <- 1.96 * s_w
print(rc)
## [1] 42.42792
# Measurement differences
diffs <- wright_mini[['First Measurement']] - wright_mini[['Second Measurement']]
# Squared differences
sq_diffs <- diffs**2
# Total squared differences
print(sum(sq_diffs))
## [1] 13479
# Mean squared difference (aka within-subject variance)
var_w <- mean(sq_diffs)
# Within-subject sample standard deviation
s_w <- sqrt(var_w)
# Coefficient of repeatability
rc <- 1.96 * s_w
print(rc)
## [1] 55.19001

The above is also calculated on this page except that they use 2 standard deviations instead of 1.96 and hence their answer is 56.3163 as opposed to 55.1900.

3.4 Bland-Altman (1996)

In their 1996 paper[9], Bland and Altman demonstrate how to calculate the repeatability coefficient when multiple repeats of the same measurement have been performed on each measurand. Instead of copy-pasting the same code we used above to replicate this example, let’s create a function to do it:

repeatability_analysis <- function(df) {
    columns <- colnames(df)
    df[['Sample Standard Deviation']] <- apply(df[columns], 1, sd)
    df[['Sample Variance']] <- df[['Sample Standard Deviation']]**2

    # Create a summary
    summary <- data.frame()
    # Mean
    for (i in seq_len(length(columns))) {
        mean <- paste('Mean of', columns[i])
        summary[1, mean] <- mean(df[[columns[i]]])
    }
    # Sample size
    summary[1, 'N'] <- nrow(df)
    # Degrees of freedom
    summary[1, 'DoF'] <- nrow(df) - 1
    # Mean variance
    summary[1, 'Mean Variance'] <- mean(df[['Sample Variance']])
    # Within-subject standard deviation
    s_w <- sqrt(mean(df[['Sample Variance']]))
    summary[1, 'Within-Subject SD (Sw)'] <- s_w
    # Coefficient of repeatability
    col <- 'Repeatability Coefficient (RC)'
    summary[1, col] <- sqrt(2) * 1.96 * s_w

    return(list(df = df, summary = summary))
}


# Raw data
df <- data.frame(
    'Measurement 1' = c(
        190, 220, 260, 210, 270, 280, 260, 275, 280, 320, 300, 270, 320, 335, 350, 360, 330, 335, 400, 430
    ),
    'Measurement 2' = c(
        220, 200, 260, 300, 265, 280, 280, 275, 290, 290, 300, 250, 330, 320, 320, 320, 340, 385, 420, 460
    ),
    'Measurement 3' = c(
        200, 240, 240, 280, 280, 270, 280, 275, 300, 300, 310, 330, 330, 335, 340, 350, 380, 360, 425, 480
    ),
    'Measurement 4' = c(
        200, 230, 280, 265, 270, 275, 300, 305, 290, 290, 300, 370, 330, 375, 365, 345, 390, 370, 420, 470
    ), check.names = FALSE
)
# Calculate agreement statistics
output <- repeatability_analysis(df)
print(output$summary)
##   Mean of Measurement 1 Mean of Measurement 2 Mean of Measurement 3
## 1                299.75                305.25                315.25
##   Mean of Measurement 4  N DoF Mean Variance Within-Subject SD (Sw)
## 1                   322 20  19      460.5208               21.45975
##   Repeatability Coefficient (RC)
## 1                       59.48339

3.5 Bland-Altman (1999)

Now that we have a function, let’s use it on the data from Bland and Altman’s 1999 paper.[10] These are systolic blood pressure measurements in mmHg taken by an operator J and a machine S:

# Raw data
j <- data.frame(
    'Measurement 1' = c(
        100, 108, 76, 108, 124, 122, 116, 114, 100, 108, 100, 108, 112, 104, 106, 122, 100, 118, 140, 150, 166, 148,
        174, 174, 140, 128, 146, 146, 220, 208, 94, 114, 126, 124, 110, 90, 106, 218, 130, 136, 100, 100, 124, 164, 100,
        136, 114, 148, 160, 84, 156, 110, 100, 100, 86, 106, 108, 168, 166, 146, 204, 96, 134, 138, 134, 156, 124, 114,
        112, 112, 202, 132, 158, 88, 170, 182, 112, 120, 110, 112, 154, 116, 108, 106, 122
    ),
    'Measurement 2' = c(
        106, 110, 84, 104, 112, 140, 108, 110, 108, 92, 106, 112, 112, 108, 108, 122, 102, 118, 134, 148, 154, 156,
        172, 166, 144, 134, 138, 152, 218, 200, 84, 124, 120, 124, 120, 90, 106, 202, 128, 136, 96, 98, 116, 168, 102,
        126, 108, 120, 150, 92, 162, 98, 106, 102, 74, 100, 110, 188, 150, 142, 198, 94, 126, 144, 136, 160, 138, 110,
        116, 116, 220, 136, 162, 76, 174, 176, 114, 118, 108, 112, 134, 112, 110, 98, 112
    ),
    'Measurement 3' = c(
        107, 108, 82, 104, 112, 124, 102, 112, 112, 100, 104, 122, 110, 104, 102, 114, 102, 120, 138, 144, 154, 134,
        166, 150, 144, 130, 140, 148, 220, 192, 86, 116, 122, 132, 128, 94, 110, 208, 130, 130, 88, 88, 122, 154, 100,
        122, 122, 132, 148, 98, 152, 98, 106, 94, 76, 110, 106, 178, 154, 132, 188, 86, 124, 140, 142, 154, 138, 114,
        122, 134, 228, 134, 152, 88, 176, 180, 124, 120, 106, 106, 130, 94, 114, 100, 112
    ), check.names = FALSE
)

# Calculate agreement statistics
output <- repeatability_analysis(j)
print(output$summary)
##   Mean of Measurement 1 Mean of Measurement 2 Mean of Measurement 3  N DoF
## 1              128.5412              127.2941              126.3882 85  84
##   Mean Variance Within-Subject SD (Sw) Repeatability Coefficient (RC)
## 1      37.40784               6.116195                       16.95323
# Raw data
s <- data.frame(
    'Measurement 1' = c(
        122, 121, 95, 127, 140, 139, 122, 130, 119, 126, 107, 123, 131, 123, 127, 142, 104, 117, 139, 143, 181, 149,
        173, 160, 158, 139, 153, 138, 228, 190, 103, 131, 131, 126, 121, 97, 116, 215, 141, 153, 113, 109, 145, 192,
        112, 152, 141, 206, 151, 112, 162, 117, 119, 136, 112, 120, 117, 194, 167, 173, 228, 77, 154, 154, 145, 200,
        188, 149, 136, 128, 204, 184, 163, 93, 178, 202, 162, 227, 133, 202, 158, 124, 114, 137, 121
    ),
    'Measurement 2' = c(
        128, 127, 94, 127, 131, 142, 112, 129, 122, 113, 113, 125, 129, 126, 119, 133, 116, 113, 127, 155, 170, 156,
        170, 155, 152, 144, 150, 144, 228, 183, 99, 131, 123, 129, 114, 94, 121, 201, 133, 143, 107, 105, 102, 178,
        116, 144, 141, 188, 147, 125, 165, 118, 131, 116, 115, 118, 118, 191, 160, 161, 218, 89, 156, 155, 154, 180,
        147, 217, 132, 125, 222, 187, 160, 88, 181, 199, 166, 227, 127, 190, 121, 149, 118, 135, 123
    ),
    'Measurement 3' = c(
        124, 128, 98, 135, 124, 136, 112, 135, 122, 111, 111, 125, 122, 114, 126, 137, 115, 112, 113, 133, 166, 140,
        154, 170, 154, 141, 154, 131, 226, 184, 106, 124, 124, 125, 125, 96, 127, 207, 146, 138, 102, 97, 137, 171,
        116, 147, 137, 166, 136, 124, 189, 109, 124, 113, 104, 132, 115, 196, 161, 154, 189, 101, 141, 148, 166, 179,
        139, 192, 133, 142, 224, 192, 152, 88, 181, 195, 148, 219, 126, 213, 134, 137, 126, 134, 128
    ), check.names = FALSE
)

# Calculate agreement statistics
output <- repeatability_analysis(s)
print(output$summary)
##   Mean of Measurement 1 Mean of Measurement 2 Mean of Measurement 3  N DoF
## 1              144.8353              142.7412              141.5059 85  84
##   Mean Variance Within-Subject SD (Sw) Repeatability Coefficient (RC)
## 1      83.14118               9.118178                        25.2743

As the machine (S) has an RC of 25.27 mmHg while the operator (J) has an RC of 16.95 mmHg, we can say that the repeatability of the machine is 49% greater (ie worse) that operator J.

4 References

  1. Joint Committee for Guides in Metrology. Evaluation of measurement data – Guide to the expression of uncertainty in measurement. JCGM. 2008;100. Online.
  2. Algorithm Comparison Working Group. Quantitative imaging biomarkers: a review of statistical methods for computer algorithm comparisons. Stat Methods Med Res. 2015;24(1):68–106. DOI: 10.1177/0962280214537390.
  3. QIBA Terminology Working Group. The Emerging Science of Quantitative Imaging Biomarkers: Terminology and Definitions for Scientific Studies and for Regulatory Submissions. Stat Methods Med Res. 2015;8(1):9–26. DOI: 10.1177/0962280214537333.
  4. British Standards Institution. Precision of test methods 1: Guide for the determination and reproducibility for a standard test method. British Standards. 1975;597(Part 1)
  5. Quantitative Imaging Biomarkers Alliance (QIBA). Indices of Repeatability, Reproducibility, and Agreement. 2013. Online.
  6. Barnhart HX, Haber MJ, Lin LI. An overview on assessing agreement with continuous measurements. J Biopharm Stat. 2007;17(4):529–69. DOI: 10.1080/10543400701376480.
  7. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb;327(8476):307–10. DOI: 10.1016/S0140-6736(86)90837-8.
  8. Medcalc. Bland-Altman plot. 2020. Online.
  9. Bland JM, Altman DG. Measurement error. BMJ. 1996;313(September):744–53. DOI: 10.1136/bmj.312.7047.1654.
  10. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–60.

⇦ Back