This tutorial is about plotting Bland-Altman graphs. For a more complete tutorial on Bland-Altman analysis as a whole, see here. For a tutorial on scatter plots in general, see here and, for line plots in general, see here.
Bland-Altman plots are used to assess the agreement between two methods of measuring something, usually clinical information. They were discussed in Bland & Altman’s 1986 paper (available here and/or here) and see also the Wikipedia page. Here is the example data that was used in the 1986 paper in data frame format:
# Peak expiratory flow rate measurements made using a Wright peak flow meter
# and a mini Wright peak flow meter
# - https://www-users.york.ac.uk/~mb55/datasets/pefr.dct
# - https://www-users.york.ac.uk/~mb55/datasets/datasets.htm
# - https://www-users.york.ac.uk/~mb55/meas/ba.pdf
df <- data.frame(
`Wright Mini` = c(
512, 430, 520, 428, 500, 600, 364, 380, 658,
445, 432, 626, 260, 477, 259, 350, 451
),
`Wright Large` = c(
494, 395, 516, 434, 476, 557, 413, 442, 650,
433, 417, 656, 267, 478, 178, 423, 427
),
check.names = FALSE
)
This data shows the maximum speed of expiration (the ‘peak expiratory flow rate’, or PEFR) of 17 people. It was collected using two different measurement devices: a large and a mini Wright peak flow meter.
The data can be plotted as a scatter graph using the plot()
function. Remember to use double square brackets when indexing the data frame to get your x- and y-data out because you want the numbers as vectors not as data frames:
x <- df[["Wright Large"]]
y <- df[["Wright Mini"]]
plot(x, y)
Add a title and axis labels with the main()
, xlab()
and ylab()
keyword arguments:
x <- df[["Wright Large"]]
y <- df[["Wright Mini"]]
plot(
x, y, main = "Peak Expiratory Flow Rate",
xlab = "Large Meter (L/min)", ylab = "Mini Meter (L/min)"
)
Now let’s add a line of equality. This shows ideal agreement: the closer the points are to the line the better the two measurement methods agree with each other.
abline()
function (to create a line ‘from a to b’). Use the arguments 0
and 1
to give the line a y-intercept of 0 and a slope/gradient of 1. Set the lty
and col
keyword arguments to customise the line type and colour, respectively (for more on the customisation options, see here).legend()
function for this, specifying the name, line type and colour of the line to appear inside it. You will also need to specify its position which can be done in one of two ways:
"bottomright"
, "top"
or"center"
# Plot
plot(
x, y, main = "Peak Expiratory Flow Rate",
xlab = "Large Meter (L/min)", ylab = "Mini Meter (L/min)"
)
# Reference line
abline(0, 1, lty = "dashed", col = "gray18")
# Include a legend
legend("topleft", "Line of Equality", lty = "dashed", col = "gray18")
Finally, let’s make some aesthetic improvements:
plot()
function:
xlim
and ylim
inside the plot()
call. By default, the axes will actually be 6% larger than what you specify; this is usually a good thing because it ensures that all the data that has been plotted is visible. However, this behaviour can be overridden by using xaxs = "i"
and yaxs = "i"
which forces the axis limits to be exactly what was specified. For the record, i
stands for “internal” and you can revert back to the default behaviour by using xaxs = "r"
and yaxs = "r"
where r
is for “regular”.par(pty="s")
which changes the parameters of the plot to make it square. This function needs to be provided before the plot()
call:# Make the plot square
par(pty = "s")
# Plot
plot(
# Data to plot
x, y, pch = 21, col = "black", bg="gray",
# Graph labels
main = "Peak Expiratory Flow Rate",
xlab = "Large Meter (L/min)", ylab = "Mini Meter (L/min)",
# Axis control
xlim = c(0, 700), ylim = c(0, 700), xaxs = "i", yaxs = "i"
)
# Reference line
abline(0, 1, lty = "dashed", col = "gray18")
# Include a legend
legend(0, 700, "Line of Equality", lty = "dashed", col = "gray18")
The Bland-Altman analysis is discussed here, but it can be replicated using the following code:
df$means <- (df[["Wright Large"]] + df[["Wright Mini"]]) / 2
df$diffs <- df[["Wright Large"]] - df[["Wright Mini"]]
# Average difference (aka the bias)
bias <- mean(df$diffs)
# Sample standard deviation
sd <- sd(df$diffs)
# Limits of agreement
upper_loa <- bias + 2 * sd
lower_loa <- bias - 2 * sd
The results can now be plotted in a new scatter graph (we’re using par(pty = "m")
here to maximise the size of the plot):
# Maximise the size of the plot
par(pty = "m")
# Get domain and range
top <- max(df$diffs)
bottom <- min(df$diffs)
left <- min(df$means)
right <- max(df$means)
domain <- right - left
range <- top - bottom
# Scatter plot
plot(
# Data to plot
df$means, df$diffs, pch = 21, col = "black", bg="gray",
# Axis labels
main = "Bland-Altman Plot for Two Methods of Measuring PEFR",
xlab = "Mean (L/min)", ylab = "Difference (L/min)",
# Axis control
xlim = c(left - 0.1 * domain, right + 0.2 * domain),
ylim = c(bottom - 0.1 * range, top + 0.1 * range),
xaxs = "i", yaxs = "i"
)
We should add in the zero line, the bias line and the limits of agreement to give these points some context. Horizontal lines that span the entire width like these can be added easily using abline()
as before:
# Maximise the size of the plot
par(pty = "m")
# Get domain and range
top <- max(df$diffs)
bottom <- min(df$diffs)
left <- min(df$means)
right <- max(df$means)
domain <- right - left
range <- top - bottom
# Scatter plot
plot(
# Data to plot
df$means, df$diffs, pch = 21, col = "black", bg="gray",
# Axis labels
main = "Bland-Altman Plot for Two Methods of Measuring PEFR",
xlab = "Mean (L/min)", ylab = "Difference (L/min)",
# Axis control
xlim = c(left - 0.1 * domain, right + 0.2 * domain),
ylim = c(bottom - 0.1 * range, top + 0.1 * range),
xaxs = "i", yaxs = "i"
)
# Zero line
abline(h = 0, col = "darkgrey")
# Upper confidence interval
abline(h = upper_loa, lty = "dashed", col = "gray18")
# Bias
abline(h = bias, lty = "dashed", col = "gray18")
# Lower confidence interval
abline(h = lower_loa, lty = "dashed", col = "gray18")
Next, let’s add the values of the limits of agreement and of the bias right onto the graph. This can be done using annotations created by the text()
function. Annotations consist of three elements:
xlim()
argument to accommodate the labelssignif()
or sprintf()
function can be used to set the number of significant figures or the number of decimal places being displayed in a number# Maximise the size of the plot
par(pty = "m")
# Get domain and range
top <- max(df$diffs)
bottom <- min(df$diffs)
left <- min(df$means)
right <- max(df$means)
domain <- right - left
range <- top - bottom
# Scatter plot
plot(
# Data to plot
df$means, df$diffs, pch = 21, col = "black", bg="gray",
# Axis labels
main = "Bland-Altman Plot for Two Methods of Measuring PEFR",
xlab = "Mean (L/min)", ylab = "Difference (L/min)",
# Axis control
xlim = c(left - 0.1 * domain, right + 0.2 * domain),
ylim = c(bottom - 0.1 * range, top + 0.1 * range),
xaxs = "i", yaxs = "i"
)
# Zero line
abline(h = 0, col = "darkgrey")
# Upper confidence interval
abline(h = upper_loa, lty = "dashed", col = "gray18")
# Bias
abline(h = bias, lty = "dashed", col = "gray18")
# Lower confidence interval
abline(h = lower_loa, lty = "dashed", col = "gray18")
# Upper confidence interval labels
text(right + 0.13 * domain, upper_loa + 0.04 * range, labels = "+2×SD")
text(right + 0.13 * domain, upper_loa - 0.04 * range, labels = sprintf("%+4.2f", upper_loa))
# Bias labels
text(right + 0.13 * domain, bias + 0.04 * range, labels = "Bias")
text(right + 0.13 * domain, bias - 0.04 * range, labels = sprintf("%+4.2f", bias))
# Lower confidence interval labels
text(right + 0.13 * domain, lower_loa + 0.04 * range, labels = "-2×SD")
text(right + 0.13 * domain, lower_loa - 0.04 * range, labels = sprintf("%+4.2f", lower_loa))
Again, we will just replicate the code for calculating these here:
# Sample size
n <- nrow(df)
# We want 95% confidence intervals
conf_int <- 0.95
# Endpoints of the range that contains 95% of the Student’s t distribution
t1 <- qt((1 - conf_int) / 2, df = n - 1)
t2 <- qt((conf_int + 1) / 2, df = n - 1)
# Variance
var <- sd**2
# Standard error of the bias
se_bias <- sqrt(var / n)
# Standard error of the limits of agreement
se_loas <- sqrt(3 * var / n)
# Confidence intervals
upper_loa_ci_lower <- upper_loa + t1 * se_loas
upper_loa_ci_upper <- upper_loa + t2 * se_loas
bias_ci_lower <- bias + t1 * se_bias
bias_ci_upper <- bias + t2 * se_bias
lower_loa_ci_lower <- lower_loa + t1 * se_loas
lower_loa_ci_upper <- lower_loa + t2 * se_loas
These can now be added to our plot. This is done by plotting a series of straight lines with segments()
and carefully controlling the positions of their left and right edges:
# Maximise the size of the plot
par(pty = "m")
# Get domain and range
max_y <- abs(max(upper_loa_ci_upper, lower_loa_ci_lower))
top <- max_y
bottom <- -max_y
range <- top - bottom
left <- min(df$means)
right <- max(df$means)
domain <- right - left
# Scatter plot
plot(
# Data to plot
df$means, df$diffs, pch = 21, col = "black", bg="gray",
# Axis labels
main = "Bland-Altman Plot for Two Methods of Measuring PEFR",
xlab = "Mean (L/min)", ylab = "Difference (L/min)",
# Axis control
xlim = c(left - 0.1 * domain, right + 0.2 * domain), xaxs = "i",
ylim = c(bottom, top),
)
# Zero line
abline(h = 0, col = "darkgrey")
# Upper confidence interval
abline(h = upper_loa, lty = "dashed", col = "gray18")
# Bias
abline(h = bias, lty = "dashed", col = "gray18")
# Lower confidence interval
abline(h = lower_loa, lty = "dashed", col = "gray18")
# Upper confidence interval labels
text(right + 0.13 * domain, upper_loa + 0.04 * range, labels = "+2×SD")
text(right + 0.13 * domain, upper_loa - 0.04 * range, labels = sprintf("%+4.2f", upper_loa))
# Bias labels
text(right + 0.13 * domain, bias + 0.04 * range, labels = "Bias")
text(right + 0.13 * domain, bias - 0.04 * range, labels = sprintf("%+4.2f", bias))
# Lower confidence interval labels
text(right + 0.13 * domain, lower_loa + 0.04 * range, labels = "-2×SD")
text(right + 0.13 * domain, lower_loa - 0.04 * range, labels = sprintf("%+4.2f", lower_loa))
# x-Values for confidence interval lines
left <- min(df$means) - 0.08 * domain
mid <- min(df$means) - 0.05 * domain
right <- min(df$means) - 0.02 * domain
# Upper confidence interval lines
segments(left, upper_loa_ci_upper, x1 = right, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, upper_loa_ci_lower, x1 = mid, y1 = upper_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, upper_loa_ci_lower, x1 = right, y1 = upper_loa_ci_lower, lty = "dashed", col = "gray68")
# Bias confidence interval lines
segments(left, bias_ci_upper, x1 = right, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(mid, bias_ci_lower, x1 = mid, y1 = bias_ci_upper, lty = "dashed", col = "gray68")
segments(left, bias_ci_lower, x1 = right, y1 = bias_ci_lower, lty = "dashed", col = "gray68")
# Lower confidence interval lines
segments(left, lower_loa_ci_upper, x1 = right, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(mid, lower_loa_ci_lower, x1 = mid, y1 = lower_loa_ci_upper, lty = "dashed", col = "gray68")
segments(left, lower_loa_ci_lower, x1 = right, y1 = lower_loa_ci_lower, lty = "dashed", col = "gray68")
Finally, use png("Name of Plot.png")
to save the plot as a PNG file to your computer.