This tutorial is about plotting Bland-Altman graphs. For a more complete tutorial on Bland-Altman analysis as a whole, see here. For a tutorial on scatter plots in general, see here and, for line plots in general, see here.
Bland-Altman plots are used to assess the agreement between two methods of measuring something, usually clinical information. They were discussed in Bland & Altman’s 1986 paper (available here and/or here) and see also the Wikipedia page. Here is the example data that was used in the 1986 paper in data frame format:
# Peak expiratory flow rate measurements made using a Wright peak flow meter
# and a mini Wright peak flow meter
# - https://www-users.york.ac.uk/~mb55/datasets/pefr.dct
# - https://www-users.york.ac.uk/~mb55/datasets/datasets.htm
# - https://www-users.york.ac.uk/~mb55/meas/ba.pdf
df <- data.frame(
`Wright Mini` = c(
512, 430, 520, 428, 500, 600, 364, 380, 658,
445, 432, 626, 260, 477, 259, 350, 451
),
`Wright Large` = c(
494, 395, 516, 434, 476, 557, 413, 442, 650,
433, 417, 656, 267, 478, 178, 423, 427
),
check.names = FALSE
)
This data shows the maximum speed of expiration (the ‘peak expiratory flow rate’, or PEFR) of 17 people. It was collected using two different measurement devices: a large and a mini Wright peak flow meter.
The data can be plotted as a scatter graph:
library(ggplot2)
ggplot()
function and the data frame df
that we created above. You will also need to create an aesthetic mapping with the aes()
functionp
as our variable.p
explicitly with the geom_point()
function.print()
on the plot p
library(ggplot2)
p <- ggplot(df, aes(x = `Wright Large`, y = `Wright Mini`))
p <- p + geom_point()
print(p)
Add a title and axis labels with ggtitle()
, xlab()
and ylab()
:
p <- p + ggtitle("Peak Expiratory Flow Rate")
p <- p + ylab("Mini Meter (L/min)")
p <- p + xlab("Large Meter (L/min)")
print(p)
Now let’s add a line of equality. This shows ideal agreement: the closer the points are to the line the better the two measurement methods agree with each other.
geom_abline()
to get a straight line, specifying an intercept
and a slope
(a ‘gradient’ for non-Americans)max()
function then set the axis limits to be 10% larger than this using ylim()
and xlim()
# Plot the line of equality
p <- p + geom_abline(intercept = 0, slope = 1)
# Set the axis limits
top <- max(df["Wright Mini"])
right <- max(df["Wright Large"])
p <- p + ylim(0, top * 1.1)
p <- p + xlim(0, right * 1.1)
print(p)
Finally, let’s make some aesthetic improvements:
geom_point()
function:
shape
changes the marker shape, see all the available options herecol
changes the colouralpha
changes the transparencybg
changes the background colour of the marker (aka the fill colour)size
changes the marker sizecoord_fixed()
. The default is to make the axes square, but this can be customised with the ratio
keyword argumentp <- ggplot(df, aes(x = `Wright Large`, y = `Wright Mini`))
# Plot the line of equality
p <- p + geom_abline(intercept = 0, slope = 1)
# Scatter plot
p <- p + geom_point(shape = 21, col = "black", alpha = 0.6, bg="grey", size = 2)
# Title and labels
p <- p + ggtitle("Peak Expiratory Flow Rate")
p <- p + ylab("Mini Meter (L/min)")
p <- p + xlab("Large Meter (L/min)")
# Set the axis limits
top <- max(df["Wright Mini"])
right <- max(df["Wright Large"])
p <- p + ylim(0, top * 1.1)
p <- p + xlim(0, right * 1.1)
# Set the aspect ratio
p <- p + coord_fixed()
print(p)
The Bland-Altman analysis is discussed here, but it can be replicated using the following code:
df$means <- (df[["Wright Large"]] + df[["Wright Mini"]]) / 2
df$diffs <- df[["Wright Large"]] - df[["Wright Mini"]]
# Average difference (aka the bias)
bias <- mean(df$diffs)
# Sample standard deviation
sd <- sd(df$diffs)
# Limits of agreement
upper_loa <- bias + 2 * sd
lower_loa <- bias - 2 * sd
The results can now be plotted in a new scatter graph:
p <- ggplot(df, aes(x = means, y = diffs))
p <- p + geom_point(shape = 21, col = "black", alpha = 0.6, bg="grey", size = 2)
p <- p + ggtitle("Bland-Altman Plot for Two Methods of Measuring PEFR")
p <- p + ylab("Difference (L/min)")
p <- p + xlab("Mean (L/min)")
# Make the y-axis symmetric around the 0-line
top <- max(df$diffs)
bottom <- min(df$diffs)
max_y <- max(abs(top), abs(bottom))
p <- p + ylim(-max_y * 1.1, max_y * 1.1)
print(p)
We should add in the zero line, the bias line and the limits of agreement to give these points some context. Horizontal lines that span the entire width like these can be added easily using geom_hline()
, although you should change the order of your code around becase the code that appears first gets plotted first and you want the horizontal lines to be at the bottom:
# Plot the zero line
p <- p + geom_hline(yintercept = 0, col = "black")
# Plot the bias and the limits of agreement
p <- p + geom_hline(yintercept = upper_loa, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = bias, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = lower_loa, col = "black", lty = "dashed")
print(p)
Next, let’s add the values of the limits of agreement and of the bias right onto the graph. This can be done using annotations created by the annotate()
function. Annotations consist of four elements:
xlim()
function to accommodate the labelssignif()
function can be used to set the number of significant figures being displayed in a number# Get the locations of the edges of the data
right <- max(df$means)
left <- min(df$means)
domain <- right - left
range <- max(df$diffs) - min(df$diffs)
# Enlarge the plot by 10% out to the right to make space for the annotations
right <- right + domain * 0.1
p <- p + xlim(left, right)
# Add the annotations
p <- p + annotate("text", x = right, y = upper_loa + 0.03 * range, label = "+2×SD")
p <- p + annotate("text", x = right, y = upper_loa - 0.03 * range, label = signif(upper_loa, digits = 3))
p <- p + annotate("text", x = right, y = bias + 0.03 * range, label = "Bias")
p <- p + annotate("text", x = right, y = bias - 0.03 * range, label = signif(bias, digits = 3))
p <- p + annotate("text", x = right, y = lower_loa + 0.03 * range, label = "-2×SD")
p <- p + annotate("text", x = right, y = lower_loa - 0.03 * range, label = signif(lower_loa, digits = 3))
print(p)
Again, we will just replicate the code for calculating these here:
# Sample size
n <- nrow(df)
# We want 95% confidence intervals
conf_int <- 0.95
# Endpoints of the range that contains 95% of the Student’s t distribution
t1 <- qt((1 - conf_int) / 2, df = n - 1)
t2 <- qt((conf_int + 1) / 2, df = n - 1)
# Variance
var <- sd**2
# Standard error of the bias
se_bias <- sqrt(var / n)
# Standard error of the limits of agreement
se_loas <- sqrt(3 * var / n)
# Confidence intervals
upper_loa_ci_lower <- upper_loa + t1 * se_loas
upper_loa_ci_upper <- upper_loa + t2 * se_loas
bias_ci_lower <- bias + t1 * se_bias
bias_ci_upper <- bias + t2 * se_bias
lower_loa_ci_lower <- lower_loa + t1 * se_loas
lower_loa_ci_upper <- lower_loa + t2 * se_loas
These can now be added to our plot. This is done by plotting a series of straight lines with geom_segment()
and carefully controlling the positions of their left and right edges:
# Create plot axes
p <- ggplot(df, aes(x = means, y = diffs))
# Plot the zero line
p <- p + geom_hline(yintercept = 0, col = "black")
# Plot the bias and the limits of agreement
p <- p + geom_hline(yintercept = upper_loa, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = bias, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = lower_loa, col = "black", lty = "dashed")
# Scatter plot
p <- p + geom_point(shape = 21, col = "black", alpha = 1, bg="grey", size = 2)
p <- p + ggtitle("Bland-Altman Plot for Two Methods of Measuring PEFR")
p <- p + ylab("Difference (L/min)")
p <- p + xlab("Mean (L/min)")
# Get the locations of the edges of the data
right <- max(df$means)
left <- min(df$means)
domain <- right - left
top <- max(upper_loa_ci_upper)
bottom <- min(lower_loa_ci_lower)
range <- top - bottom
# Enlarge the plot by 10% out to the right and left to make space for the annotations
right <- right + domain * 0.1
left <- left - domain * 0.1
p <- p + xlim(left, right)
# Make the y-axis symmetric around the 0-line
max_y <- max(abs(top), abs(bottom))
p <- p + ylim(-max_y, max_y)
# Add the annotations
p <- p + annotate("text", x = right, y = upper_loa + 0.03 * range, label = "+2×SD")
p <- p + annotate("text", x = right, y = upper_loa - 0.03 * range, label = signif(upper_loa, digits = 3))
p <- p + annotate("text", x = right, y = bias + 0.03 * range, label = "Bias")
p <- p + annotate("text", x = right, y = bias - 0.03 * range, label = signif(bias, digits = 3))
p <- p + annotate("text", x = right, y = lower_loa + 0.03 * range, label = "-2×SD")
p <- p + annotate("text", x = right, y = lower_loa - 0.03 * range, label = signif(lower_loa, digits = 3))
# Add the confidence intervals. Make the widths of these equal to 4% of the width of the plot
left_edge <- left
middle <- left + domain * 0.02
right_edge <- left + domain * 0.04
p <- p + geom_segment(aes(x = middle, y = upper_loa_ci_lower, xend = middle, yend = upper_loa_ci_upper))
p <- p + geom_segment(aes(x = middle, y = bias_ci_lower, xend = middle, yend = bias_ci_upper))
p <- p + geom_segment(aes(x = middle, y = lower_loa_ci_lower, xend = middle, yend = lower_loa_ci_upper))
# Add the confidence intervals' caps
p <- p + geom_segment(aes(x = left_edge, y = upper_loa_ci_upper, xend = right_edge, yend = upper_loa_ci_upper))
p <- p + geom_segment(aes(x = left_edge, y = upper_loa_ci_lower, xend = right_edge, yend = upper_loa_ci_lower))
p <- p + geom_segment(aes(x = left_edge, y = bias_ci_upper, xend = right_edge, yend = bias_ci_upper))
p <- p + geom_segment(aes(x = left_edge, y = bias_ci_lower, xend = right_edge, yend = bias_ci_lower))
p <- p + geom_segment(aes(x = left_edge, y = lower_loa_ci_upper, xend = right_edge, yend = lower_loa_ci_upper))
p <- p + geom_segment(aes(x = left_edge, y = lower_loa_ci_lower, xend = right_edge, yend = lower_loa_ci_lower))
print(p)
Finally, use ggsave("File Name.png")
to save the plot as a PNG file to your computer.