⇦ Back

This tutorial is about plotting Bland-Altman graphs. For a more complete tutorial on Bland-Altman analysis as a whole, see here. For a tutorial on scatter plots in general, see here, and for line plots in general, see here.

Bland-Altman plots are used to assess the agreement between two methods of measuring something, usually clinical information. They were discussed in Bland & Altman’s 1986 paper (available here and/or here) and see also the Wikipedia page. Here is the example data that was used in the 1986 paper in data frame format (note that, in order to use data frames, the pandas library first needs to be imported):

import pandas as pd

# Peak expiratory flow rate measurements made using a Wright peak flow meter
# and a mini Wright peak flow meter.
# - https://www-users.york.ac.uk/~mb55/datasets/pefr.dct
# - https://www-users.york.ac.uk/~mb55/datasets/datasets.htm
# - https://www-users.york.ac.uk/~mb55/meas/ba.pdf
df = pd.DataFrame({
    'Wright Mini': [
        512, 430, 520, 428, 500, 600, 364, 380, 658,
        445, 432, 626, 260, 477, 259, 350, 451
    ],
    'Wright Large': [
        494, 395, 516, 434, 476, 557, 413, 442, 650,
        433, 417, 656, 267, 478, 178, 423, 427
    ]
})

This data shows the maximum speed of expiration (the ‘peak expiratory flow rate’, or PEFR) of 17 people. It was collected using two different measurement devices: a large and a mini Wright peak flow meter.

1 Value-Value Plot

The data can be plotted as a scatter graph:

  • ax = plt.axes() create an axes object called ax on which points can be plotted
  • ax.scatter(x, y) plots x- and y-data as points in a scatter graph on the axes ax
  • A figure title and axis labels can be added with ax.set_title(), ax.set_ylabel() and ax.set_xlabel()
import matplotlib.pyplot as plt

ax = plt.axes()
ax.scatter(df['Wright Large'], df['Wright Mini'])
ax.set_title('Peak Expiratory Flow Rate')
ax.set_ylabel('Mini Meter (L/min)')
ax.set_xlabel('Large Meter (L/min)')

Now let’s add a line of equality. This shows ideal agreement: the closer the points are to the line the better the two measurement methods agree with each other.

  • First, use ax.get_xlim() to get the lowest and highest numbers on the x-axis. Note that these are not the smallest and largest values that have been plotted; as you can see from the plot above, when the graph is created there is some whitespace added around the points to make the graph look better. This means that the range over which the x-axis runs is larger than that of the data.
  • Once the left and right extents of the x-axis have been returned, a straight line can be created from the origin to the upper-right corner with ax.plot()
  • If we were to take a look at the graph now the line of equality would be hanging in the middle of the plot area because the axis limits would have been updated when it was plotted. Manually set the x-axis limits you want using ax.set_ylim() and ax.set_xlim().
  • plt.show() is an explicit instruction to (re-)show the plot
# Get the axis limits
left, right = ax.get_xlim()
# Plot the line of equality
ax.plot([0, right], [0, right])
# Set the axis limits
ax.set_ylim(0, right)
ax.set_xlim(0, right)

plt.show()

Finally, let’s make some aesthetic improvements:

  • Change the colour, size, transparency, marker shape and line type of the plot points and the line of equality by adding keyword arguments in the ax.scatter() and ax.plot() calls:
    • c or color will change the marker or line colour (see your options here)
    • s will change the marker or line size
    • alpha will change the transparency
    • marker will change the marker type (see your options here)
    • ls will change the line style (see your options here)
  • Make the axes’ scales equal by setting the aspect ratio using the ax.set_aspect('equal') function
  • Add a legend to give a label to the line of equality. This can be done by first using the label keyword argument in the ax.plot() call and, secondly, by adding a legend via ax.legend() (use the frameon=False option to remove the legend’s border):
ax = plt.axes()
ax.scatter(df['Wright Large'], df['Wright Mini'], c='k', s=20, alpha=0.6, marker='o')
ax.set_title('Peak Expiratory Flow Rate')
ax.set_ylabel('Mini Meter (L/min)')
ax.set_xlabel('Large Meter (L/min)')
# Get axis limits
left, right = ax.get_xlim()
# Reference line
ax.plot([0, right], [0, right], c='grey', ls='--', label='Line of Equality')
# Set axis limits
ax.set_ylim(0, right)
ax.set_xlim(0, right)
# Set aspect ratio
ax.set_aspect('equal')
# Legend
ax.legend(frameon=False)

plt.show()

2 Bland-Altman Plot

The Bland-Altman analysis is discussed here, but it can be replicated using the following code:

import pandas as pd
import numpy as np

means = df.mean(axis=1)
diffs = df.diff(axis=1).iloc[:, -1]
# Average difference (aka the bias)
bias = np.mean(diffs)
# Sample standard deviation
sd = np.std(diffs, ddof=1)
# Limits of agreement
upper_loa = bias + 2 * sd
lower_loa = bias - 2 * sd

The results can now be plotted in a new scatter graph:

ax = plt.axes()
ax.scatter(means, diffs, c='k', s=20, alpha=0.6, marker='o')
ax.set_title('Bland-Altman Plot for Two Methods of Measuring PEFR')
ax.set_ylabel('Difference (L/min)')
ax.set_xlabel('Mean (L/min)')
# Get axis limits
left, right = plt.xlim()
bottom, top = plt.ylim()
# Set y-axis limits
max_y = max(abs(bottom), abs(top))
ax.set_ylim(-max_y * 1.1, max_y * 1.1)
# Set x-axis limits
domain = right - left
ax.set_xlim(left, left + domain * 1.1)

plt.show()

We should add in the zero line, the bias line and the limits of agreement to give these points some context. Horizontal lines that span the entire width like these can be added easily using ax.axhline():

# Plot the zero line
ax.axhline(y=0, c='k', lw=0.5)
# Plot the bias and the limits of agreement
ax.axhline(y=upper_loa, c='grey', ls='--')
ax.axhline(y=bias, c='grey', ls='--')
ax.axhline(y=lower_loa, c='grey', ls='--')

plt.show()

Next, let’s add the values of the limits of agreement and of the bias right onto the graph. This can be done using annotations created by the ax.annotate() function. Annotations consist of three elements:

  • The text to display on the graph
  • The xy coordinates of the point you want to annotate, specified as a tuple (ie two numbers separated by a comma and inside round brackets)
  • The x- and y-coordinates of the text label, more properly called xytext. This is again a tuple.
  • Optionally, you can choose the coordinate system used by the xytext argument. This option is called textcoords. We will use 'offset pixels' as this will allow us to specify the positions of our text labels relative to the points we are annotating.

These elements should be specified in the ax.annotate() calls, in the same order as which they are discussed above.

ax.annotate('+2×SD', (right, upper_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{upper_loa:+4.2f}', (right, upper_loa), (-10, -27), textcoords='offset pixels')
ax.annotate('Bias', (right, bias), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{bias:+4.2f}', (right, bias), (-10, -27), textcoords='offset pixels')
ax.annotate('-2×SD', (right, lower_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{lower_loa:+4.2f}', (right, lower_loa), (-10, -27), textcoords='offset pixels')

plt.show()

2.1 Confidence Intervals

Again, we will just replicate the code for calculating these here:

import scipy.stats as stats

# Sample size
n = df.shape[0]
# Variance
var = sd**2
# Standard error of the bias
se_bias = np.sqrt(var / n)
# Standard error of the limits of agreement
se_loas = np.sqrt(3 * var / n)
# Endpoints of the range that contains 95% of the Student’s t distribution
t_interval = stats.t.interval(alpha=0.95, df=n - 1)
# Confidence intervals
ci_bias = bias + np.array(t_interval) * se_bias
ci_upperloa = upper_loa + np.array(t_interval) * se_loas
ci_lowerloa = lower_loa + np.array(t_interval) * se_loas

These can now be added to our plot. This is done by plotting a series of straight lines:

# Plot the confidence intervals
ax.plot([left] * 2, list(ci_upperloa), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_bias), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_lowerloa), c='grey', ls='--', alpha=0.5)
# Plot the confidence intervals' caps
x_range = [left - domain * 0.025, left + domain * 0.025]
ax.plot(x_range, [ci_upperloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_upperloa[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[0]] * 2, c='grey', ls='--', alpha=0.5)
# Adjust the x- and y-axis limits
max_y = max(abs(ci_upperloa[1]), abs(ci_lowerloa[0]))
ax.set_ylim(-max_y * 1.05, max_y * 1.05)
ax.set_xlim(left - domain * 0.05, left + domain * 1.1)

plt.show()

3 Latex and Image Size

See here for more about using Latex formatting in the title and axes’ labels and see here for more about changing the image size. Here it is in action:

# Make figures A5 in size
A = 5
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
# Image quality
plt.rc('figure', dpi=141)
# Be able to add Latex
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

#
# Plot
#
ax = plt.axes()
ax.scatter(means, diffs, c='k', s=20, alpha=0.6, marker='o')
ax.set_title('Bland-Altman Plot for Two Methods of Measuring PEFR')
ax.set_ylabel(r'Difference, $d$ (L/min)')
ax.set_xlabel(r'Mean, $\mu$ (L/min)')
# Set x-axis limits
left, right = plt.xlim()
domain = right - left
ax.set_xlim(left - domain * 0.05, left + domain * 1.1)
# Plot the zero line
ax.axhline(y=0, c='k', lw=0.5)
# Plot the bias and the limits of agreement
ax.axhline(y=upper_loa, c='grey', ls='--')
ax.axhline(y=bias, c='grey', ls='--')
ax.axhline(y=lower_loa, c='grey', ls='--')
# Add the annotations
ax.annotate('+2×SD', (right, upper_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{upper_loa:+4.2f}', (right, upper_loa), (-10, -27), textcoords='offset pixels')
ax.annotate('Bias', (right, bias), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{bias:+4.2f}', (right, bias), (-10, -27), textcoords='offset pixels')
ax.annotate('-2×SD', (right, lower_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{lower_loa:+4.2f}', (right, lower_loa), (-10, -27), textcoords='offset pixels')
# Plot the confidence intervals
ax.plot([left] * 2, list(ci_upperloa), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_bias), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_lowerloa), c='grey', ls='--', alpha=0.5)
# Plot the confidence intervals' caps
x_range = [left - domain * 0.025, left + domain * 0.025]
ax.plot(x_range, [ci_upperloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_upperloa[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[0]] * 2, c='grey', ls='--', alpha=0.5)
# Set y-axis limits
max_y = max(abs(ci_upperloa[1]), abs(ci_lowerloa[0]))
ax.set_ylim(-max_y * 1.05, max_y * 1.05)

plt.show()

Finish by saving the figure as a PNG, JPG, PDF or other type of image with plt.savefig(<filename>) where <filename> is the name you want the image file to have with the extension included. If you are plotting more than one figure in the same Python script, use plt.figure() and plt.close() before and after each, respectively, in order to tell Python when one plot ends and the next one starts.

⇦ Back