1 Where to Start: Bland & Altman’s Difference Plot
2 Use Percentage Differences: Giavarina Analysis
3 A Bland-Altman Function
4 References

Bland-Altman analysis is used to assess the agreement between two methods of measuring something, usually clinical information. It was discussed in Bland & Altman’s 1986 paper^[1] and see also the Wikipedia page.

Giavarina analysis is identical to Bland-Altman analysis except that it accounts for heteroscedasticity. It does this by using percentage differences (relative to the means) on the y-axis instead of raw differences. It was published in Giavarina’s 2015 paper^[2].

Calculating agreement is useful when discussing:

Test-retest experiments
Repeatability and reproducibility
Reliability
Intra- and inter-operator agreement
Intra- and inter-test agreement

1 Where to Start: Bland & Altman’s Difference Plot

The example shown in Bland & Altman (1986) uses data created especially for the paper. Bland measured the maximum speed of expiration (peak expiratory flow rate or PEFR) of 17 people, mainly his family and friends, using two different devices: a large and a mini Wright peak flow meter. This data is shown below:

df <- data.frame(
    `Wright Mini` = c(
        512, 430, 520, 428, 500, 600, 364, 380, 658,
        445, 432, 626, 260, 477, 259, 350, 451
    ),
    `Wright Large` = c(
        494, 395, 516, 434, 476, 557, 413, 442, 650,
        433, 417, 656, 267, 478, 178, 423, 427
    ),
    check.names = FALSE
)

1.1 Visualise the Data

Here are the measurements taken by the two different devices plotted against one another (this re-produces Figure 1 of Bland & Altman (1986)):

# Make the plot square
par(pty="s")
# Plot
plot(
    # Data to plot
    df[["Wright Large"]], df[["Wright Mini"]], pch = 21, col = "black", bg="gray",
    # Axis labels
    main = "Peak Expiratory Flow Rate", xlab = "Large Meter (L/min)", ylab = "Mini Meter (L/min)",
    # Axis control
    xlim = c(0, 700), ylim = c(0, 700), xaxs = "i", yaxs = "i"
)
# Reference line
abline(0, 1, lty = "dashed", col = "gray18")
# Include a legend
legend(0, 700, "Line of Equality", lty = "dashed", col = "gray18")

1.2 Time for Regression Analysis?

As Bland and Altman say in their paper, at this point it’s usual to calculate the correlation coefficient (r) between the two methods. Doing so will give r = 0.94 (p < 0.001). They continue: “The null hypothesis here is that the measurements by the two methods are not linearly related. The probability is very small and we can safely conclude that PEFR measurements by the mini and large meters are related. However, this high correlation does not mean that the two methods agree”, to paraphrase:

r measures how correlated two variables are, not the extent to which they agree. Perfect agreement implies that all the points lie along the line of equality, while a large r value merely implies that they all lie on a straight line (which could be any straight line)
Correlation depends on the range over which you test. If we only look at the data below 500 L/min or only the data above 500 L/min we get smaller values of r (0.88 and 0.90 respectively) than when we look at all the data together (r = 0.94). However, it would be absurd to argue that agreement is worse below 500 L/min and worse above 500 L/min than it is for everybody.
The test of significance may show that the two methods are related, but it would be amazing if two methods designed to measure the same quantity were not related! The test of significance is irrelevant to the question of agreement.
The r statistic is difficult to interpret: is an r value of 0.992 much worse than one of 0.996? How much worse?

All the above leads us towards looking for a new method of measuring agreement:

1.3 Bland-Altman Analysis

In essence, if we are interested in knowing to what extent two measurement methods differ we should calculate the average difference between the values they produce when measuring the same participant. If this is small it means that it is effectively irrelevant which method you use; both will yield a similar result. The two methods or devices can thus be used interchangeably, or one can be used instead of the other if it is preferable for whatever reason. To check that this measurement difference is not related to the actual value that is being measured, it is useful to plot it against the mean value of the two values (ideally you would plot it against the true value, but as this is unknown the mean value that was measured is your best guess).

First calculate the means and differences. This is the data that will be plotted:

means <- (df[["Wright Large"]] + df[["Wright Mini"]]) / 2
diffs <- df[["Wright Large"]] - df[["Wright Mini"]]

Then find the average difference and the standard deviation of the differences. These are good indicators of whether or not a method is biased and how consistent this bias is, respectively:

bias <- mean(diffs)
sd <- sd(diffs)

If we assume that the data is Normally distributed it means that 95% of the points lie within two standard deviations of the mean (use 1.96 standard deviations if you want to be more accurate). The endpoints of this 95% range are known as the “limits of agreement” (LOA):

upper_loa <- bias + 2 * sd
lower_loa <- bias - 2 * sd

Now we can plot this data (this plot re-produces Figure 2 of Bland & Altman (1986)):

# Maximise the size of the plot
par(pty = "m")
# Get domain and range
domain <- max(means) - min(means)
range <- max(diffs) - min(diffs)
# Scatter plot
plot(
    # Data to plot
    means, diffs,
    # Axis labels
    main = "Bland-Altman Plot for Two Methods of Measuring PEFR", xlab = "Mean (L/min)", ylab = "Difference (L/min)",
    pch = 21, col = "black", bg="gray",
    # Axis control
    xlim = c(min(means) - 0.1 * domain, max(means) + 0.2 * domain),
    ylim = c(min(diffs) - 0.1 * range, max(diffs) + 0.1 * range),
    xaxs = "i", yaxs = "i"
)
# Zero line
abline(h = 0, col = "darkgrey")
# Upper confidence interval
abline(h = upper_loa, lty = "dashed", col = "gray18")
# Bias
abline(h = bias, lty = "dashed", col = "gray18")
# Lower confidence interval
abline(h = lower_loa, lty = "dashed", col = "gray18")
# Upper confidence interval labels
text(max(means) + 0.13 * domain, upper_loa + 0.04 * range, labels = "+2×SD")
text(max(means) + 0.13 * domain, upper_loa - 0.04 * range, labels = sprintf("%+4.2f", upper_loa))
# Bias labels
text(max(means) + 0.13 * domain, bias + 0.04 * range, labels = "Bias")
text(max(means) + 0.13 * domain, bias - 0.04 * range, labels = sprintf("%+4.2f", bias))
# Lower confidence interval labels
text(max(means) + 0.13 * domain, lower_loa + 0.04 * range, labels = "-2×SD")
text(max(means) + 0.13 * domain, lower_loa - 0.04 * range, labels = sprintf("%+4.2f", lower_loa))

1.4 Confidence Intervals

Neither the bias line nor the limits of agreement are known with certainty. The standard error of the bias can be estimate as:

\({se} = \sqrt{\frac{ {sd}^2}{n}}\)

where \(sd\) is the standard deviation and \(n\) is the sample size. Similarly, the standard errors of the limits of agreement are approximately:

\({se} = \sqrt{\frac{3{sd}^2}{n}}\)

If we use the Student’s t distribution we can calculate a “t-statistic” that corresponds to a 95% confidence interval given \(n-1\) degrees of freedom. This will then give us the confidence intervals on the bias and LOA lines as follows:

\({CI} = y\pm\left(t\times{se}\right)\)

In R, these calculations look as follows:

# Sample size
n <- nrow(df)
# We want 95% confidence intervals
conf_int <- 0.95
# Endpoints of the range that contains 95% of the Student’s t distribution
t1 <- qt((1 - conf_int) / 2, df = n - 1)
t2 <- qt((conf_int + 1) / 2, df = n - 1)
# Variance
var <- sd**2
# Standard error of the bias
se_bias <- sqrt(var / n)
# Standard error of the limits of agreement
se_loas <- sqrt(3 * var / n)
# Confidence intervals
upper_loa_ci_lower <- upper_loa + t1 * se_loas
upper_loa_ci_upper <- upper_loa + t2 * se_loas
bias_ci_lower <- bias + t1 * se_bias
bias_ci_upper <- bias + t2 * se_bias
lower_loa_ci_lower <- lower_loa + t1 * se_loas
lower_loa_ci_upper <- lower_loa + t2 * se_loas

These can now be added to the plot:

# Maximise the size of the plot
par(pty = "m")
# Get domain and range
domain <- max(means) - min(means)
range <- max(diffs) - min(diffs)
# Scatter plot
plot(
    # Data to plot
    means, diffs,
    # Axis labels
    main = "Bland-Altman Plot for Two Methods of Measuring PEFR", xlab = "Mean (L/min)", ylab = "Difference (L/min)",
    pch = 21, col = "black", bg="gray",
    # Axis control
    xlim = c(min(means) - 0.1 * domain, max(means) + 0.2 * domain),
    ylim = c(min(diffs) - 0.3 * range, max(diffs) + 0.3 * range),
    xaxs = "i", yaxs = "i"
)
# Zero line
abline(h = 0, col = "darkgrey")
# Upper confidence interval
abline(h = upper_loa, lty = "dashed", col = "gray18")
# Bias
abline(h = bias, lty = "dashed", col = "gray18")
# Lower confidence interval
abline(h = lower_loa, lty = "dashed", col = "gray18")
# Upper confidence interval labels
text(max(means) + 0.13 * domain, upper_loa + 0.04 * range, labels = "+2×SD")
text(max(means) + 0.13 * domain, upper_loa - 0.04 * range, labels = sprintf("%+4.2f", upper_loa))
# Bias labels
text(max(means) + 0.13 * domain, bias + 0.04 * range, labels = "Bias")
text(max(means) + 0.13 * domain, bias - 0.04 * range, labels = sprintf("%+4.2f", bias))
# Lower confidence interval labels
text(max(means) + 0.13 * domain, lower_loa + 0.04 * range, labels = "-2×SD")
text(max(means) + 0.13 * domain, lower_loa - 0.04 * range, labels = sprintf("%+4.2f", lower_loa))
# X-values for confidence interval lines
left <- min(means) - 0.08 * domain
mid <- min(means) - 0.05 * domain
right <- min(means) - 0.02 * domain
# Upper confidence interval lines
segments(left, upper_loa_ci_upper, x1 = right, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, upper_loa_ci_lower, x1 = mid, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, upper_loa_ci_lower, x1 = right, y1 = upper_loa_ci_lower, lty = "dashed", col = "gray68")
# Bias confidence interval lines
segments(left, bias_ci_upper, x1 = right, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(mid, bias_ci_lower, x1 = mid, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(left, bias_ci_lower, x1 = right, y1 = bias_ci_lower, lty = "dashed", col = "gray68")
# Lower confidence interval lines
segments(left, lower_loa_ci_upper, x1 = right, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, lower_loa_ci_lower, x1 = mid, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, lower_loa_ci_lower, x1 = right, y1 = lower_loa_ci_lower, lty = "dashed", col = "gray68")

2 Use Percentage Differences: Giavarina Analysis

This example reproduces the one used in Giavarina (2015).

Imagine a situation where two methods are used to measure something in particular and the two readings differ by 10 units. If this represents a large proportional difference - eg if one reading was 10 and the other was 20 - then this implies that the agreement between the methods is poor. On the other hand, if this only represents a small proportional difference - eg if one reading was 1000 and the other was 1010 - then we might not care. We could say that the agreement is good. Traditional Bland-Altman analysis does not capture this possibility: it assumes that all differences are equally consequential to the overall agreement. Giavarina analysis addresses this and, as a result, is more appropriate for data that is heteroscedastic - ie it becomes more spread out as the readings become larger.

The following hypothetical data is used as an example in Giavarina (2015):

df <- data.frame(
    `Method B` = c(
        8, 16, 30, 24, 39, 54, 40, 68, 72, 62, 122, 80,
        181, 259, 275, 380, 320, 434, 479, 587, 626, 648,
        738, 766, 793, 851, 871, 957, 1001, 960
    ),
    `Method A` = c(
        1, 5, 10, 20, 50, 40, 50, 60, 70, 80, 90, 100,
        150, 200, 250, 300, 350, 400, 450, 500, 550, 600,
        650, 700, 750, 800, 850, 900, 950, 1000
    ),
    check.names = FALSE
)

2.1 Regression Analysis

Just like the Bland-Altman paper, the first recommended step is to visualise the data. Giavarina takes the extra step of performing Passing-Bablok regression, which is explained on this page:

library(mcr)
library(scales)

# Comparison of two measurement methods using regression analysis
# (specifically, use the "PaBa" or Passing-Bablok method)
value <- mcreg(df[["Method A"]], df[["Method B"]], method.reg = "PaBa")
# Passing-Bablok parameters
intercept_est <- value@para[1]
intercept_lci <- value@para[5]
intercept_uci <- value@para[7]
gradient_est <- value@para[2]
gradient_lci <- value@para[6]
gradient_uci <- value@para[8]

# Make the plot square
par(pty="s")
# Scatter plot
plot(
    # Data to plot
    df[["Method A"]], df[["Method B"]], pch = 21, col = "black", bg="gray",
    # Axis labels
    main = "Regression Analysis of Hypothetical Data", xlab = "Method A", ylab = "Method B",
    # Axis control
    xlim = c(0, 1050), ylim = c(0, 1050), xaxs = "i", yaxs = "i"
)
# Fill in colour between the confidence intervals
mylims <- par("usr")
x <- c(mylims[1], mylims[2], mylims[2], mylims[1])
y <- c(
    gradient_lci * x[1] + intercept_lci,
    gradient_lci * x[2] + intercept_lci,
    gradient_uci * x[2] + intercept_uci,
    gradient_uci * x[1] + intercept_uci
)
polygon(x, y, col = alpha("dodgerblue", 0.2))
# Reference line
abline(0, 1, lty = "dashed", col = "gray18")
# Passing-Bablok regression lines
abline(intercept_est, gradient_est, col = "dodgerblue", lwd = 2)
abline(intercept_lci, gradient_lci, col = alpha("dodgerblue", 1))
abline(intercept_uci, gradient_uci, col = alpha("dodgerblue", 1))
# Include a legend
legend(
    0, 1050,
    c(
        "Line of Equality",
        sprintf("%4.2fx + %4.2f", gradient_est, intercept_est),
        sprintf("Upper CI: %4.2fx + %4.2f", gradient_uci, intercept_uci),
        sprintf("Lower CI: %4.2fx + %4.2f", gradient_lci, intercept_lci)
    ),
    lty = c("dashed", "solid", "solid", "solid"),
    lwd = c(1, 2, 1, 1),
    col = c("gray18", "dodgerblue", alpha("dodgerblue", 1), alpha("dodgerblue", 1)),
    cex = 0.8
)

The above re-produces Figure 1 from Giavarina (2015).

2.2 Bland-Altman Analysis

Figure 6 in Giavarina (2015) is a traditional Bland-Altman plot.

Summary statistics:

means <- (df[["Method A"]] + df[["Method B"]]) / 2
diffs <- df[["Method A"]] - df[["Method B"]]
bias <- mean(diffs)
sd <- sd(diffs)
upper_loa <- bias + 1.96 * sd
lower_loa <- bias - 1.96 * sd

Confidence intervals:

# Sample size
n <- nrow(df)
# We want 95% confidence intervals
conf_int <- 0.95
# Endpoints of the range that contains 95% of the Student’s t distribution
t1 <- qt((1 - conf_int) / 2, df = n - 1)
t2 <- qt((conf_int + 1) / 2, df = n - 1)
# Variance
var <- sd**2
# Standard error of the bias
se_bias <- sqrt(var / n)
# Standard error of the limits of agreement
se_loas <- sqrt(3 * var / n)
# Confidence intervals
upper_loa_ci_lower <- upper_loa + t1 * se_loas
upper_loa_ci_upper <- upper_loa + t2 * se_loas
bias_ci_lower <- bias + t1 * se_bias
bias_ci_upper <- bias + t2 * se_bias
lower_loa_ci_lower <- lower_loa + t1 * se_loas
lower_loa_ci_upper <- lower_loa + t2 * se_loas

Plot:

# Maximise the size of the plot
par(pty = "m")
# Get domain and range
domain <- max(means) - min(means)
range <- max(diffs) - min(diffs)
# Scatter plot
plot(
    # Data to plot
    means, diffs,
    # Axis labels
    main = "Bland-Altman Plot for Two Hypothetical Measurement Methods", xlab = "Mean", ylab = "Difference",
    pch = 21, col = "black", bg="gray",
    # Axis control
    xlim = c(min(means) - 0.1 * domain, max(means) + 0.25 * domain),
    ylim = c(min(diffs) - 0.3 * range, max(diffs) + 0.3 * range),
    xaxs = "i", yaxs = "i"
)
# Zero line
abline(h = 0, col = "darkgrey")
# Upper confidence interval
abline(h = upper_loa, lty = "dashed", col = "gray18")
# Bias
abline(h = bias, lty = "dashed", col = "gray18")
# Lower confidence interval
abline(h = lower_loa, lty = "dashed", col = "gray18")
# Upper confidence interval labels
text(max(means) + 0.13 * domain, upper_loa + 0.04 * range, labels = "+1.96×SD")
text(max(means) + 0.13 * domain, upper_loa - 0.04 * range, labels = sprintf("%+4.2f", upper_loa))
# Bias labels
text(max(means) + 0.13 * domain, bias + 0.04 * range, labels = "Bias")
text(max(means) + 0.13 * domain, bias - 0.04 * range, labels = sprintf("%+4.2f", bias))
# Lower confidence interval labels
text(max(means) + 0.13 * domain, lower_loa + 0.04 * range, labels = "-1.96×SD")
text(max(means) + 0.13 * domain, lower_loa - 0.04 * range, labels = sprintf("%+4.2f", lower_loa))
# X-values for confidence interval lines
left <- min(means) - 0.08 * domain
mid <- min(means) - 0.05 * domain
right <- min(means) - 0.02 * domain
# Upper confidence interval lines
segments(left, upper_loa_ci_upper, x1 = right, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, upper_loa_ci_lower, x1 = mid, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, upper_loa_ci_lower, x1 = right, y1 = upper_loa_ci_lower, lty = "dashed", col = "gray68")
# Bias confidence interval lines
segments(left, bias_ci_upper, x1 = right, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(mid, bias_ci_lower, x1 = mid, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(left, bias_ci_lower, x1 = right, y1 = bias_ci_lower, lty = "dashed", col = "gray68")
# Lower confidence interval lines
segments(left, lower_loa_ci_upper, x1 = right, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, lower_loa_ci_lower, x1 = mid, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, lower_loa_ci_lower, x1 = right, y1 = lower_loa_ci_lower, lty = "dashed", col = "gray68")

2.3 Giavarina Analysis

The percentage differences are calculated relative to the means:

means <- (df[["Method A"]] + df[["Method B"]]) / 2
diffs <- df[["Method A"]] - df[["Method B"]]
percent_diffs = diffs / means * 100
bias <- mean(percent_diffs)
sd <- sd(percent_diffs)
upper_loa <- bias + 1.96 * sd
lower_loa <- bias - 1.96 * sd

Confidence intervals:

# Sample size
n <- nrow(df)
# We want 95% confidence intervals
conf_int <- 0.95
# Endpoints of the range that contains 95% of the Student’s t distribution
t1 <- qt((1 - conf_int) / 2, df = n - 1)
t2 <- qt((conf_int + 1) / 2, df = n - 1)
# Variance
var <- sd**2
# Standard error of the bias
se_bias <- sqrt(var / n)
# Standard error of the limits of agreement
se_loas <- sqrt(3 * var / n)
# Confidence intervals
upper_loa_ci_lower <- upper_loa + t1 * se_loas
upper_loa_ci_upper <- upper_loa + t2 * se_loas
bias_ci_lower <- bias + t1 * se_bias
bias_ci_upper <- bias + t2 * se_bias
lower_loa_ci_lower <- lower_loa + t1 * se_loas
lower_loa_ci_upper <- lower_loa + t2 * se_loas

Plot (this re-produces Figure 7 in Giavarina (2015)):

# Maximise the size of the plot
par(pty = "m")
# Get domain and range
domain <- max(means) - min(means)
range <- max(percent_diffs) - min(percent_diffs)
# Scatter plot
plot(
    # Data to plot
    means, percent_diffs,
    # Axis labels
    main = "Giavarina Plot for Two Hypothetical Measurement Methods", xlab = "Mean", ylab = "Percentage Difference (%)",
    pch = 21, col = "black", bg="gray",
    # Axis control
    xlim = c(min(means) - 0.1 * domain, max(means) + 0.25 * domain),
    ylim = c(min(percent_diffs) - 0.3 * range, max(percent_diffs) + 0.3 * range),
    xaxs = "i", yaxs = "i"
)
# Zero line
abline(h = 0, col = "darkgrey")
# Upper confidence interval
abline(h = upper_loa, lty = "dashed", col = "gray18")
# Bias
abline(h = bias, lty = "dashed", col = "gray18")
# Lower confidence interval
abline(h = lower_loa, lty = "dashed", col = "gray18")
# Upper confidence interval labels
text(max(means) + 0.13 * domain, upper_loa + 0.04 * range, labels = "+1.96×SD")
text(max(means) + 0.13 * domain, upper_loa - 0.04 * range, labels = sprintf("%+4.2f%%", upper_loa))
# Bias labels
text(max(means) + 0.13 * domain, bias + 0.04 * range, labels = "Bias")
text(max(means) + 0.13 * domain, bias - 0.04 * range, labels = sprintf("%+4.2f%%", bias))
# Lower confidence interval labels
text(max(means) + 0.13 * domain, lower_loa + 0.04 * range, labels = "-1.96×SD")
text(max(means) + 0.13 * domain, lower_loa - 0.04 * range, labels = sprintf("%+4.2f%%", lower_loa))
# X-values for confidence interval lines
left <- min(means) - 0.08 * domain
mid <- min(means) - 0.05 * domain
right <- min(means) - 0.02 * domain
# Upper confidence interval lines
segments(left, upper_loa_ci_upper, x1 = right, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, upper_loa_ci_lower, x1 = mid, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, upper_loa_ci_lower, x1 = right, y1 = upper_loa_ci_lower, lty = "dashed", col = "gray68")
# Bias confidence interval lines
segments(left, bias_ci_upper, x1 = right, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(mid, bias_ci_lower, x1 = mid, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(left, bias_ci_lower, x1 = right, y1 = bias_ci_lower, lty = "dashed", col = "gray68")
# Lower confidence interval lines
segments(left, lower_loa_ci_upper, x1 = right, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, lower_loa_ci_lower, x1 = mid, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, lower_loa_ci_lower, x1 = right, y1 = lower_loa_ci_lower, lty = "dashed", col = "gray68")

3 A Bland-Altman Function

When doing multiple Bland-Altman calculations it’s often useful to have it as a function. Multiple data sets can then be analysed in quick succession. Here’s an example of such a function:

bland_altman_analysis <- function(df) {
    # Individual sample calculations
    df[['Mean']] <- (df[[colnames(df)[1]]] + df[[colnames(df)[2]]]) / 2
    df[['Diff']] <- df[[colnames(df)[2]]] - df[[colnames(df)[1]]]

    # Whole sample calculations
    summary <- data.frame()
    means <- c(paste("Mean of", colnames(df)[1]), paste("Mean of", colnames(df)[2]))
    summary[1, paste("Mean of", colnames(df)[1])] <- mean(df[[colnames(df)[1]]])
    summary[1, paste("Mean of", colnames(df)[2])] <- mean(df[[colnames(df)[2]]])
    # Sample size
    summary[1, "N"] <- nrow(df)
    # Degrees of freedom
    summary[1, "DoF"] <- nrow(df) -1 
    # Bias (mean difference)
    mean_diff <- mean(df[["Diff"]])
    summary[1, "Mean Diff (Bias)"] <- mean_diff
    # Population standard deviation of the differences
    st_dev_diff <- sd(df[["Diff"]]) * sqrt((nrow(df) - 1) / nrow(df))
    summary[1, "SD Diffs"] <- st_dev_diff
    summary[1, "Lower LoA"] <- mean_diff - 1.96 * st_dev_diff
    summary[1, "Upper LoA"] <- mean_diff + 1.96 * st_dev_diff
    # Within-subject standard deviation
    df[['Sample SD']] <- apply(df[colnames(df)[c(1, 2)]], 1, sd)
    df[['Sample Variance']] <- df[['Sample SD']]**2
    s_w <- sqrt(mean(df[['Sample Variance']]))
    summary[1, "Within-Subject SD (Sw)"] <- s_w
    # Coefficient of repeatability
    summary[1, "Repeatability Coefficient (RC)"] <- sqrt(2) * 1.96 * s_w

    # Return
    return(list(df = df, summary = summary))
}

…and here is how it can be used:

3.1 O’Brien and Kaiser’s Repeated-Measures Data

From the R Documentation: “These contrived repeated-measures data are taken from O’Brien and Kaiser (1985). The data are from an imaginary study in which 16 female and male subjects, who are divided into three treatments, are measured at a pretest, postest, and a follow-up session; during each session, they are measured at five occasions at intervals of one hour. The design, therefore, has two between-subject and two within-subject factors.”

library(carData)

df <- OBrienKaiser[c("pre.3", "pre.4")]
results <- bland_altman_analysis(df)
df <- results[[1]]
summary <- results[[2]]
print(summary)

##   Mean of pre.3 Mean of pre.4  N DoF Mean Diff (Bias) SD Diffs Lower LoA
## 1        5.3125           4.5 16  15          -0.8125 1.775484 -4.292449
##   Upper LoA Within-Subject SD (Sw) Repeatability Coefficient (RC)
## 1  2.667449                1.38067                       3.827022

3.2 Statsmodels

This example comes from here:

set.seed(9999)
m1 <- runif(20)
m2 <- runif(20)
df <- data.frame(pre.1 = m2, pre.2 = m1)
results <- bland_altman_analysis(df)
df <- results[[1]]
summary <- results[[2]]
print(summary)

##   Mean of pre.1 Mean of pre.2  N DoF Mean Diff (Bias)  SD Diffs  Lower LoA
## 1     0.5148932      0.597867 20  19       0.08297379 0.5215621 -0.9392878
##   Upper LoA Within-Subject SD (Sw) Repeatability Coefficient (RC)
## 1  1.105235              0.3734378                       1.035117

3.3 Bland-Altman (1986)

These examples come from the same Bland & Altman (1986) paper^[1]:

# Raw data
wright_large <- data.frame(
    `First Measurement` = c(
        494, 395, 516, 434, 476, 557, 413, 442, 650, 433, 417, 656, 267, 478, 178, 423, 427
    ),
    `Second Measurement` = c(
        490, 397, 512, 401, 470, 611, 415, 431, 638, 429, 420, 633, 275, 492, 165, 372, 421
    ), check.names = FALSE
)
# Bland-Altman analysis
output <- bland_altman_analysis(wright_large)
print(output$summary)

##   Mean of First Measurement Mean of Second Measurement  N DoF Mean Diff (Bias)
## 1                  450.3529                   445.4118 17  16        -4.941176
##   SD Diffs Lower LoA Upper LoA Within-Subject SD (Sw)
## 1 21.07541 -46.24898  36.36663               15.30667
##   Repeatability Coefficient (RC)
## 1                       42.42792

# Raw data
wright_mini <- data.frame(
    `First Measurement` = c(
        512, 430, 520, 428, 500, 600, 364, 380, 658, 445, 432, 626, 260, 477, 259, 350, 451
    ),
    `Second Measurement` = c(
        525, 415, 508, 444, 500, 625, 460, 390, 642, 432, 420, 605, 227, 467, 268, 370, 443
    ), check.names = FALSE
)
# Bland-Altman analysis
output <- bland_altman_analysis(wright_mini)
print(output$summary)

##   Mean of First Measurement Mean of Second Measurement  N DoF Mean Diff (Bias)
## 1                  452.4706                   455.3529 17  16         2.882353
##   SD Diffs Lower LoA Upper LoA Within-Subject SD (Sw)
## 1 28.01026 -52.01775  57.78245               19.91083
##   Repeatability Coefficient (RC)
## 1                       55.19001

4 References

Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb;327(8476):307–10. DOI: 10.1016/S0140-6736(86)90837-8. PMID: 2868172.
Giavarina D. Understanding Bland Altman analysis. Biochemia Medica. 2015;25(2):141-151. DOI: 10.11613/BM.2015.015.

⇦ Back

Statistics in R:Bland-Altman and Giavarina Analysis