⇦ Back

This tutorial is about plotting Bland-Altman graphs. For a more complete tutorial on Bland-Altman analysis as a whole, see here. For a tutorial on scatter plots in general, see here and, for line plots in general, see here.

Bland-Altman plots are used to assess the agreement between two methods of measuring something, usually clinical information. They were discussed in Bland & Altman’s 1986 paper (available here and/or here) and see also the Wikipedia page. Here is the example data that was used in the 1986 paper in data frame format:

# Peak expiratory flow rate measurements made using a Wright peak flow meter
# and a mini Wright peak flow meter
# - https://www-users.york.ac.uk/~mb55/datasets/pefr.dct
# - https://www-users.york.ac.uk/~mb55/datasets/datasets.htm
# - https://www-users.york.ac.uk/~mb55/meas/ba.pdf
df <- data.frame(
    `Wright Mini` = c(
        512, 430, 520, 428, 500, 600, 364, 380, 658,
        445, 432, 626, 260, 477, 259, 350, 451
    ),
    `Wright Large` = c(
        494, 395, 516, 434, 476, 557, 413, 442, 650,
        433, 417, 656, 267, 478, 178, 423, 427
    ),
    check.names = FALSE
)

This data shows the maximum speed of expiration (the ‘peak expiratory flow rate’, or PEFR) of 17 people. It was collected using two different measurement devices: a large and a mini Wright peak flow meter.

1 Value-Value Plot

The data can be plotted as a scatter graph:

  • Import the ggplot library with library(ggplot2)
  • Create the plot area with the ggplot() function and the data frame df that we created above. You will also need to create an aesthetic mapping with the aes() function
  • Assign this plot to a variable. In this example, we’ve chosen p as our variable.
  • Even though you’ve created a plot area and defined a data frame with x- and y-data, the points will not yet be on the plot. They need to be added to p explicitly with the geom_point() function.
  • Display the graph by calling print() on the plot p
library(ggplot2)

p <- ggplot(df, aes(x = `Wright Large`, y = `Wright Mini`))
p <- p + geom_point()

print(p)

Add a title and axis labels with ggtitle(), xlab() and ylab():

p <- p + ggtitle("Peak Expiratory Flow Rate")
p <- p + ylab("Mini Meter (L/min)")
p <- p + xlab("Large Meter (L/min)")

print(p)

Now let’s add a line of equality. This shows ideal agreement: the closer the points are to the line the better the two measurement methods agree with each other.

  • Add a geom_abline() to get a straight line, specifying an intercept and a slope (a ‘gradient’ for non-Americans)
  • Get the maximum values of the data you are plotting using the max() function then set the axis limits to be 10% larger than this using ylim() and xlim()
# Plot the line of equality
p <- p + geom_abline(intercept = 0, slope = 1)
# Set the axis limits
top <- max(df["Wright Mini"])
right <- max(df["Wright Large"])
p <- p + ylim(0, top * 1.1)
p <- p + xlim(0, right * 1.1)

print(p)

Finally, let’s make some aesthetic improvements:

  • Change the marker shape, colour, transparency, fill colour and size using keyword arguments inside the geom_point() function:
    • shape changes the marker shape, see all the available options here
    • col changes the colour
    • alpha changes the transparency
    • bg changes the background colour of the marker (aka the fill colour)
    • size changes the marker size
  • Change the aspect ratio with coord_fixed(). The default is to make the axes square, but this can be customised with the ratio keyword argument
p <- ggplot(df, aes(x = `Wright Large`, y = `Wright Mini`))
# Plot the line of equality
p <- p + geom_abline(intercept = 0, slope = 1)
# Scatter plot
p <- p + geom_point(shape = 21, col = "black", alpha = 0.6, bg="grey", size = 2)
# Title and labels
p <- p + ggtitle("Peak Expiratory Flow Rate")
p <- p + ylab("Mini Meter (L/min)")
p <- p + xlab("Large Meter (L/min)")
# Set the axis limits
top <- max(df["Wright Mini"])
right <- max(df["Wright Large"])
p <- p + ylim(0, top * 1.1)
p <- p + xlim(0, right * 1.1)
# Set the aspect ratio
p <- p + coord_fixed()

print(p)

2 Bland-Altman Plot

The Bland-Altman analysis is discussed here, but it can be replicated using the following code:

df$means <- (df[["Wright Large"]] + df[["Wright Mini"]]) / 2
df$diffs <- df[["Wright Large"]] - df[["Wright Mini"]]
# Average difference (aka the bias)
bias <- mean(df$diffs)
# Sample standard deviation
sd <- sd(df$diffs)
# Limits of agreement
upper_loa <- bias + 2 * sd
lower_loa <- bias - 2 * sd

The results can now be plotted in a new scatter graph:

p <- ggplot(df, aes(x = means, y = diffs))
p <- p + geom_point(shape = 21, col = "black", alpha = 0.6, bg="grey", size = 2)
p <- p + ggtitle("Bland-Altman Plot for Two Methods of Measuring PEFR")
p <- p + ylab("Difference (L/min)")
p <- p + xlab("Mean (L/min)")
# Make the y-axis symmetric around the 0-line
top <- max(df$diffs)
bottom <- min(df$diffs)
max_y <- max(abs(top), abs(bottom))
p <- p + ylim(-max_y * 1.1, max_y * 1.1)

print(p)

We should add in the zero line, the bias line and the limits of agreement to give these points some context. Horizontal lines that span the entire width like these can be added easily using geom_hline(), although you should change the order of your code around becase the code that appears first gets plotted first and you want the horizontal lines to be at the bottom:

# Plot the zero line
p <- p + geom_hline(yintercept = 0, col = "black")
# Plot the bias and the limits of agreement
p <- p + geom_hline(yintercept = upper_loa, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = bias, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = lower_loa, col = "black", lty = "dashed")

print(p)

Next, let’s add the values of the limits of agreement and of the bias right onto the graph. This can be done using annotations created by the annotate() function. Annotations consist of four elements:

  • The type of element being added, in this case text
  • The x- and y-coordinates of the text label
    • It’s a good idea to locate the labels at an offset from the data itself; in the example below, offsets equal to 3% of the height of the plot (loosely, the ‘range’ of the data) are being used
    • Additionally, the plot’s width (loosely, the ‘domain’) is being enlarged by 10% to the right via the xlim() function to accommodate the labels
  • The label text itself
    • The signif() function can be used to set the number of significant figures being displayed in a number
# Get the locations of the edges of the data
right <- max(df$means)
left <- min(df$means)
domain <- right - left
range <- max(df$diffs) - min(df$diffs)
# Enlarge the plot by 10% out to the right to make space for the annotations
right <- right + domain * 0.1
p <- p + xlim(left, right)
# Add the annotations
p <- p + annotate("text", x = right, y = upper_loa + 0.03 * range, label = "+2×SD")
p <- p + annotate("text", x = right, y = upper_loa - 0.03 * range, label = signif(upper_loa, digits = 3))
p <- p + annotate("text", x = right, y = bias + 0.03 * range, label = "Bias")
p <- p + annotate("text", x = right, y = bias - 0.03 * range, label = signif(bias, digits = 3))
p <- p + annotate("text", x = right, y = lower_loa + 0.03 * range, label = "-2×SD")
p <- p + annotate("text", x = right, y = lower_loa - 0.03 * range, label = signif(lower_loa, digits = 3))

print(p)

2.1 Confidence Intervals

Again, we will just replicate the code for calculating these here:

# Sample size
n <- nrow(df)
# We want 95% confidence intervals
conf_int <- 0.95
# Endpoints of the range that contains 95% of the Student’s t distribution
t1 <- qt((1 - conf_int) / 2, df = n - 1)
t2 <- qt((conf_int + 1) / 2, df = n - 1)
# Variance
var <- sd**2
# Standard error of the bias
se_bias <- sqrt(var / n)
# Standard error of the limits of agreement
se_loas <- sqrt(3 * var / n)
# Confidence intervals
upper_loa_ci_lower <- upper_loa + t1 * se_loas
upper_loa_ci_upper <- upper_loa + t2 * se_loas
bias_ci_lower <- bias + t1 * se_bias
bias_ci_upper <- bias + t2 * se_bias
lower_loa_ci_lower <- lower_loa + t1 * se_loas
lower_loa_ci_upper <- lower_loa + t2 * se_loas

These can now be added to our plot. This is done by plotting a series of straight lines with geom_segment() and carefully controlling the positions of their left and right edges:

# Create plot axes
p <- ggplot(df, aes(x = means, y = diffs))
# Plot the zero line
p <- p + geom_hline(yintercept = 0, col = "black")
# Plot the bias and the limits of agreement
p <- p + geom_hline(yintercept = upper_loa, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = bias, col = "black", lty = "dashed")
p <- p + geom_hline(yintercept = lower_loa, col = "black", lty = "dashed")
# Scatter plot
p <- p + geom_point(shape = 21, col = "black", alpha = 1, bg="grey", size = 2)
p <- p + ggtitle("Bland-Altman Plot for Two Methods of Measuring PEFR")
p <- p + ylab("Difference (L/min)")
p <- p + xlab("Mean (L/min)")
# Get the locations of the edges of the data
right <- max(df$means)
left <- min(df$means)
domain <- right - left
top <- max(upper_loa_ci_upper)
bottom <- min(lower_loa_ci_lower)
range <- top - bottom
# Enlarge the plot by 10% out to the right and left to make space for the annotations
right <- right + domain * 0.1
left <- left - domain * 0.1
p <- p + xlim(left, right)
# Make the y-axis symmetric around the 0-line
max_y <- max(abs(top), abs(bottom))
p <- p + ylim(-max_y, max_y)
# Add the annotations
p <- p + annotate("text", x = right, y = upper_loa + 0.03 * range, label = "+2×SD")
p <- p + annotate("text", x = right, y = upper_loa - 0.03 * range, label = signif(upper_loa, digits = 3))
p <- p + annotate("text", x = right, y = bias + 0.03 * range, label = "Bias")
p <- p + annotate("text", x = right, y = bias - 0.03 * range, label = signif(bias, digits = 3))
p <- p + annotate("text", x = right, y = lower_loa + 0.03 * range, label = "-2×SD")
p <- p + annotate("text", x = right, y = lower_loa - 0.03 * range, label = signif(lower_loa, digits = 3))
# Add the confidence intervals. Make the widths of these equal to 4% of the width of the plot
left_edge <- left
middle <- left + domain * 0.02
right_edge <- left + domain * 0.04
p <- p + geom_segment(aes(x = middle, y = upper_loa_ci_lower, xend = middle, yend = upper_loa_ci_upper))
p <- p + geom_segment(aes(x = middle, y = bias_ci_lower, xend = middle, yend = bias_ci_upper))
p <- p + geom_segment(aes(x = middle, y = lower_loa_ci_lower, xend = middle, yend = lower_loa_ci_upper))
# Add the confidence intervals' caps
p <- p + geom_segment(aes(x = left_edge, y = upper_loa_ci_upper, xend = right_edge, yend = upper_loa_ci_upper))
p <- p + geom_segment(aes(x = left_edge, y = upper_loa_ci_lower, xend = right_edge, yend = upper_loa_ci_lower))
p <- p + geom_segment(aes(x = left_edge, y = bias_ci_upper, xend = right_edge, yend = bias_ci_upper))
p <- p + geom_segment(aes(x = left_edge, y = bias_ci_lower, xend = right_edge, yend = bias_ci_lower))
p <- p + geom_segment(aes(x = left_edge, y = lower_loa_ci_upper, xend = right_edge, yend = lower_loa_ci_upper))
p <- p + geom_segment(aes(x = left_edge, y = lower_loa_ci_lower, xend = right_edge, yend = lower_loa_ci_lower))

print(p)

3 Save Plot

Finally, use ggsave("File Name.png") to save the plot as a PNG file to your computer.

⇦ Back