This tutorial is about plotting Bland-Altman graphs. For a more complete tutorial on Bland-Altman analysis as a whole, see here. For a tutorial on scatter plots in general, see here, and for line plots in general, see here.
Bland-Altman plots are used to assess the agreement between two methods of measuring something, usually clinical information. They were discussed in Bland & Altman’s 1986 paper (available here and/or here) and see also the Wikipedia page. Here is the example data that was used in the 1986 paper in data frame format (note that, in order to use data frames, the pandas
library first needs to be imported):
import pandas as pd
# Peak expiratory flow rate measurements made using a Wright peak flow meter
# and a mini Wright peak flow meter.
# - https://www-users.york.ac.uk/~mb55/datasets/pefr.dct
# - https://www-users.york.ac.uk/~mb55/datasets/datasets.htm
# - https://www-users.york.ac.uk/~mb55/meas/ba.pdf
df = pd.DataFrame({
'Wright Mini': [
512, 430, 520, 428, 500, 600, 364, 380, 658,
445, 432, 626, 260, 477, 259, 350, 451
],
'Wright Large': [
494, 395, 516, 434, 476, 557, 413, 442, 650,
433, 417, 656, 267, 478, 178, 423, 427
]
})
This data shows the maximum speed of expiration (the ‘peak expiratory flow rate’, or PEFR) of 17 people. It was collected using two different measurement devices: a large and a mini Wright peak flow meter.
The data can be plotted as a scatter graph:
ax = plt.axes()
create an axes object called ax
on which points can be plottedax.scatter(x, y)
plots x- and y-data as points in a scatter graph on the axes ax
ax.set_title()
, ax.set_ylabel()
and ax.set_xlabel()
import matplotlib.pyplot as plt
ax = plt.axes()
ax.scatter(df['Wright Large'], df['Wright Mini'])
ax.set_title('Peak Expiratory Flow Rate')
ax.set_ylabel('Mini Meter (L/min)')
ax.set_xlabel('Large Meter (L/min)')
Now let’s add a line of equality. This shows ideal agreement: the closer the points are to the line the better the two measurement methods agree with each other.
ax.get_xlim()
to get the lowest and highest numbers on the x-axis. Note that these are not the smallest and largest values that have been plotted; as you can see from the plot above, when the graph is created there is some whitespace added around the points to make the graph look better. This means that the range over which the x-axis runs is larger than that of the data.ax.plot()
ax.set_ylim()
and ax.set_xlim()
.plt.show()
is an explicit instruction to (re-)show the plot# Get the axis limits
left, right = ax.get_xlim()
# Plot the line of equality
ax.plot([0, right], [0, right])
# Set the axis limits
ax.set_ylim(0, right)
ax.set_xlim(0, right)
plt.show()
Finally, let’s make some aesthetic improvements:
ax.scatter()
and ax.plot()
calls:
ax.set_aspect('equal')
functionlabel
keyword argument in the ax.plot()
call and, secondly, by adding a legend via ax.legend()
(use the frameon=False
option to remove the legend’s border):ax = plt.axes()
ax.scatter(df['Wright Large'], df['Wright Mini'], c='k', s=20, alpha=0.6, marker='o')
ax.set_title('Peak Expiratory Flow Rate')
ax.set_ylabel('Mini Meter (L/min)')
ax.set_xlabel('Large Meter (L/min)')
# Get axis limits
left, right = ax.get_xlim()
# Reference line
ax.plot([0, right], [0, right], c='grey', ls='--', label='Line of Equality')
# Set axis limits
ax.set_ylim(0, right)
ax.set_xlim(0, right)
# Set aspect ratio
ax.set_aspect('equal')
# Legend
ax.legend(frameon=False)
plt.show()
The Bland-Altman analysis is discussed here, but it can be replicated using the following code:
import pandas as pd
import numpy as np
means = df.mean(axis=1)
diffs = df.diff(axis=1).iloc[:, -1]
# Average difference (aka the bias)
bias = np.mean(diffs)
# Sample standard deviation
sd = np.std(diffs, ddof=1)
# Limits of agreement
upper_loa = bias + 2 * sd
lower_loa = bias - 2 * sd
The results can now be plotted in a new scatter graph:
ax = plt.axes()
ax.scatter(means, diffs, c='k', s=20, alpha=0.6, marker='o')
ax.set_title('Bland-Altman Plot for Two Methods of Measuring PEFR')
ax.set_ylabel('Difference (L/min)')
ax.set_xlabel('Mean (L/min)')
# Get axis limits
left, right = plt.xlim()
bottom, top = plt.ylim()
# Set y-axis limits
max_y = max(abs(bottom), abs(top))
ax.set_ylim(-max_y * 1.1, max_y * 1.1)
# Set x-axis limits
domain = right - left
ax.set_xlim(left, left + domain * 1.1)
plt.show()
We should add in the zero line, the bias line and the limits of agreement to give these points some context. Horizontal lines that span the entire width like these can be added easily using ax.axhline()
:
# Plot the zero line
ax.axhline(y=0, c='k', lw=0.5)
# Plot the bias and the limits of agreement
ax.axhline(y=upper_loa, c='grey', ls='--')
ax.axhline(y=bias, c='grey', ls='--')
ax.axhline(y=lower_loa, c='grey', ls='--')
plt.show()
Next, let’s add the values of the limits of agreement and of the bias right onto the graph. This can be done using annotations created by the ax.annotate()
function. Annotations consist of three elements:
'offset pixels'
as this will allow us to specify the positions of our text labels relative to the points we are annotating.These elements should be specified in the ax.annotate()
calls, in the same order as which they are discussed above.
ax.annotate('+2×SD', (right, upper_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{upper_loa:+4.2f}', (right, upper_loa), (-10, -27), textcoords='offset pixels')
ax.annotate('Bias', (right, bias), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{bias:+4.2f}', (right, bias), (-10, -27), textcoords='offset pixels')
ax.annotate('-2×SD', (right, lower_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{lower_loa:+4.2f}', (right, lower_loa), (-10, -27), textcoords='offset pixels')
plt.show()
Again, we will just replicate the code for calculating these here:
import scipy.stats as stats
# Sample size
n = df.shape[0]
# Variance
var = sd**2
# Standard error of the bias
se_bias = np.sqrt(var / n)
# Standard error of the limits of agreement
se_loas = np.sqrt(3 * var / n)
# Endpoints of the range that contains 95% of the Student’s t distribution
t_interval = stats.t.interval(alpha=0.95, df=n - 1)
# Confidence intervals
ci_bias = bias + np.array(t_interval) * se_bias
ci_upperloa = upper_loa + np.array(t_interval) * se_loas
ci_lowerloa = lower_loa + np.array(t_interval) * se_loas
These can now be added to our plot. This is done by plotting a series of straight lines:
# Plot the confidence intervals
ax.plot([left] * 2, list(ci_upperloa), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_bias), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_lowerloa), c='grey', ls='--', alpha=0.5)
# Plot the confidence intervals' caps
x_range = [left - domain * 0.025, left + domain * 0.025]
ax.plot(x_range, [ci_upperloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_upperloa[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[0]] * 2, c='grey', ls='--', alpha=0.5)
# Adjust the x- and y-axis limits
max_y = max(abs(ci_upperloa[1]), abs(ci_lowerloa[0]))
ax.set_ylim(-max_y * 1.05, max_y * 1.05)
ax.set_xlim(left - domain * 0.05, left + domain * 1.1)
plt.show()
See here for more about using Latex formatting in the title and axes’ labels and see here for more about changing the image size. Here it is in action:
# Make figures A5 in size
A = 5
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
# Image quality
plt.rc('figure', dpi=141)
# Be able to add Latex
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')
#
# Plot
#
ax = plt.axes()
ax.scatter(means, diffs, c='k', s=20, alpha=0.6, marker='o')
ax.set_title('Bland-Altman Plot for Two Methods of Measuring PEFR')
ax.set_ylabel(r'Difference, $d$ (L/min)')
ax.set_xlabel(r'Mean, $\mu$ (L/min)')
# Set x-axis limits
left, right = plt.xlim()
domain = right - left
ax.set_xlim(left - domain * 0.05, left + domain * 1.1)
# Plot the zero line
ax.axhline(y=0, c='k', lw=0.5)
# Plot the bias and the limits of agreement
ax.axhline(y=upper_loa, c='grey', ls='--')
ax.axhline(y=bias, c='grey', ls='--')
ax.axhline(y=lower_loa, c='grey', ls='--')
# Add the annotations
ax.annotate('+2×SD', (right, upper_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{upper_loa:+4.2f}', (right, upper_loa), (-10, -27), textcoords='offset pixels')
ax.annotate('Bias', (right, bias), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{bias:+4.2f}', (right, bias), (-10, -27), textcoords='offset pixels')
ax.annotate('-2×SD', (right, lower_loa), (-10, 10), textcoords='offset pixels')
ax.annotate(f'{lower_loa:+4.2f}', (right, lower_loa), (-10, -27), textcoords='offset pixels')
# Plot the confidence intervals
ax.plot([left] * 2, list(ci_upperloa), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_bias), c='grey', ls='--', alpha=0.5)
ax.plot([left] * 2, list(ci_lowerloa), c='grey', ls='--', alpha=0.5)
# Plot the confidence intervals' caps
x_range = [left - domain * 0.025, left + domain * 0.025]
ax.plot(x_range, [ci_upperloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_upperloa[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_bias[0]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[1]] * 2, c='grey', ls='--', alpha=0.5)
ax.plot(x_range, [ci_lowerloa[0]] * 2, c='grey', ls='--', alpha=0.5)
# Set y-axis limits
max_y = max(abs(ci_upperloa[1]), abs(ci_lowerloa[0]))
ax.set_ylim(-max_y * 1.05, max_y * 1.05)
plt.show()
Finish by saving the figure as a PNG, JPG, PDF or other type of image with plt.savefig(<filename>)
where <filename>
is the name you want the image file to have with the extension included. If you are plotting more than one figure in the same Python script, use plt.figure()
and plt.close()
before and after each, respectively, in order to tell Python when one plot ends and the next one starts.