⇦ Back

Passing–Bablok regression is a non-parametric way to, for example, compare two methods of measuring something in order to judge if they are equivalent or not. It was first described in Passing & Bablok (1983).1

1 General Overview

Essentially, when two different methods are being used to take the same measurements the values generated can be plotted on a scatter plot and a line-of-best-fit can fitted. The Passing-Bablok method does this fitting by:

  • Looking at every combination of two points in the scatter plot
  • Drawing a line between the two points in each of these combinations
  • Taking the gradient of each of these lines
  • Extending these lines to the y-axis and taking the y-intercepts
  • Taking the median of the gradients and the median of the y-intercepts of these lines

The median gradient and median y-intercept create the overall line-of-best-fit. This has an associated confidence interval which can be interpreted as follows:

  • If 1 is within the confidence interval of the gradient and 0 is within the confidence interval of the y-intercept, then the two methods are comparable within the investigated concentration range
  • If 1 is not within the confidence interval of the gradient then there is a proportional difference between the two methods
  • If 0 is not within the confidence interval of the y-intercept then there is a systematic difference

See the Wikipedia article on Passing–Bablok regression for a longer overview.

2 Worked Example: Giavarina (2015)

Let’s use the following hypothetical example dataset which comes from Giavarina (2015)2. Imagine that a set of objects were each measured twice - once using ‘Method A’ and once using ‘Method B’ - giving the two lists of measurements below:

import pandas as pd

dct = {
    'Method A': [
        1, 5, 10, 20, 50, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350,
        400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000
    ],
    'Method B': [
        8, 16, 30, 24, 39, 54, 40, 68, 72, 62, 122, 80, 181, 259, 275, 380,
        320, 434, 479, 587, 626, 648, 738, 766, 793, 851, 871, 957, 1001, 960
    ]
}
df = pd.DataFrame(dct)
print(df.head())
##    Method A  Method B
## 0         1         8
## 1         5        16
## 2        10        30
## 3        20        24
## 4        50        39

First, let’s visualise this raw data:

import matplotlib.pyplot as plt

# Options
x = 5  # Want figures to be A5
plt.rc('figure', figsize=[46.82 * .5**(.5 * x), 33.11 * .5**(.5 * x)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Plot
ax = plt.axes()
ax.set_title('The Raw Data')
ax.set_xlabel('Method A')
ax.set_ylabel('Method B')
# Scatter plot
ax.scatter(df['Method A'], df['Method B'], c='k', s=20, alpha=0.6, marker='o')
# Get axis limits
left, right = plt.xlim()
bottom, top = plt.ylim()
# Change axis limits
ax.set_xlim(0, right)
ax.set_ylim(0, top)
# Reference line
label = 'Reference line'
ax.plot([left, right], [left, right], c='grey', ls='--', label=label)
# Set aspect ratio
ax.set_aspect('equal')
# Legend
ax.legend(frameon=False)
# Show
plt.show()

3 Calculate Gradients

Next, take each pair of points on this scatter plot and calculate the gradient (‘slope’, \(S\)) of the straight line that connects them.

  • Exclude pairs of points if:
    • They are identical (because the idea of calculating the gradient of a line connecting two points that have the same coordinates is meaningless), or if;
    • The gradient of their connecting line is equal to -1 (for reasons of symmetry that are explained in the Passing & Bablok paper)
  • If two points have the same x-value and thus their connecting line is vertical, define the gradient of this line to be either \(+\infty\) or \(-\infty\) depending on the sign of the difference of their y-values
import numpy as np

# Convert data to arrays
x = df['Method A'].values
y = df['Method B'].values

# Initialise a list of the gradients between each combination of two points
S = []
# Iterate over all combinations of two points
for i in range(len(x) - 1):
    for j in range(i + 1, len(x)):
        y_i, y_j, x_i, x_j = y[i], y[j], x[i], x[j]
        # Ignore identical points
        if (y_i == y_j) and (x_i == x_j):
            continue
        # Avoid division by zero errors
        if x_i == x_j:
            if y_i > y_j:
                # Set the gradient to positive infinity
                gradient = np.inf
                S.append(gradient)
                continue
            else:
                # Set the gradient to negative infinity
                gradient = -np.inf
                S.append(gradient)
                continue
        # Calculate the gradient between this pair of points
        gradient = (y_i - y_j) / (x_i - x_j)
        # Ignore any gradient equal to -1
        if gradient == -1:
            continue
        # Add the gradient to the list of gradients
        S.append(gradient)

Because we are considering every pair of points in our dataset and our dataset has a sample size of 30, the number of ways to connect two points is \(\binom{30}{2} = 435\). However, one of the pairs of points - \(\left( 70, 72 \right)\) and \(\left( 80, 62 \right)\) - has a connecting line the gradient of which is -1 and Passing & Bablok tell us to exclude such combinations. The number of gradients we have calculated, \(N\), is thus 434:

# N is the number of elements in S (which might be fewer than the number of
# raw data points)
N = len(S)
print(N)
## 434

4 Calculate the (Shifted) Median

In preparation for taking the median value of the gradients, sort the list of gradients:

# Sort the list of gradients in preparation for taking the median
S.sort()

However, as Passing & Bablok point out, the values of these gradients are not independent and so their median would be a biased estimator of the gradient of the overall line-of-best-fit. As such, we need to use an offset, \(K\), to calculate a shifted median, \(b\), which can be used as an estimate for the overall gradient. This offset is defined as the number of gradients that have a value of less than -1:

# K is the number of gradients less than -1
S = np.array(S)
K = (S < -1).sum()
print(K)
## 5

…and the estimated overall gradient of the line-of-best-fit is determined by taking a shifted median of all the gradients:

# Calculate the shifted median
if N % 2 != 0:
    # If N is odd
    idx = (N + 1) / 2 + K
    # Convert to an integer and adjust for the fact that Python is 0-indexed
    idx = int(idx) - 1
    b = S[idx]
else:
    # If N is even
    idx = N / 2 + K
    # Convert to an integer and adjust for the fact that Python is 0-indexed
    idx = int(idx) - 1
    b = 0.5 * (S[idx] + S[idx + 1])
print(b)
## 1.055312195800306

Using this estimated gradient of the line-of-best-fit, we can plug in the raw data to get the estimated y-intercept of the line-of-best-fit, \(a\), as follows:

a = np.median(y - b * x)
print(a)
## 7.081855791962137

This gives us:

print(f'Line-of-best-fit: y = {a:4.2f} + {b:4.2f} x')
## Line-of-best-fit: y = 7.08 + 1.06 x

These values match the source we are using (Giavarina (2015)).

5 Calculate the Confidence Intervals

Usually we are interested in the 95% confidence interval and so could simply use the well-known fact that this interval width corresponds to about 1.96 (or about 2) standard deviations either side of the mean. However, let’s calculate this explicitly to make sure we are being accurate:

from scipy import stats as st

# Confidence level
C = 0.95  # 95%
# Significance level
gamma = 1 - C  # 0.05
# Quantile (the cumulative probability; two-tailed)
q = 1 - (gamma / 2)  # 0.975
# Critical z-score, calculated using the percent-point function (aka the
# quantile function) of the normal distribution
w = st.norm.ppf(q)  # 1.96

print(w)
## 1.959963984540054

Passing & Bablok provide formulas for getting the indexes of the gradients that correspond to the bounds of the confidence intervals:

# Sample size
n = len(x)

# Intermediate values
C_gamma = w * np.sqrt((n * (n - 1) * (2 * n + 5)) / 18)
M_1 = np.round((N - C_gamma) / 2)
M_2 = N - M_1 + 1

print(C_gamma, M_1, M_2)
## 109.8571032220829 162.0 273.0
# Get the lower and upper bounds of the confidence interval for the gradient
# (convert floats to ints and subtract 1 to adjust for Python being 0-indexed)
b_L = S[int(M_1) + K - 1]
b_U = S[int(M_2) + K - 1]
# Get the lower and upper bounds of the confidence interval for the y-intercept
a_L = np.median(y - b_U * x)
a_U = np.median(y - b_L * x)

We can check that these results indeed still match the source material (Giavarina (2015)):

print(f'Gradient = {b:4.2f} ({b_L:4.2f} to {b_U:4.2f})')
print(f'y-intercept = {a:4.2f} ({a_L:4.2f} to {a_U:4.2f})')
## Gradient = 1.06 (1.02 to 1.09)
## y-intercept = 7.08 (-0.30 to 19.84)

6 Visualise the Regression

Re-plotting the raw data with the regression line plus confidence interval added in:

# Plot
ax = plt.axes()
ax.set_title('Passing-Bablok Regression')
ax.set_xlabel('Method A')
ax.set_ylabel('Method B')
# Scatter plot
ax.scatter(df['Method A'], df['Method B'], c='k', s=20, alpha=0.6, marker='o')
# Get axis limits
left, right = plt.xlim()
bottom, top = plt.ylim()
# Change axis limits
ax.set_xlim(0, right)
ax.set_ylim(0, top)
# Reference line
label = 'Reference line'
ax.plot([left, right], [left, right], c='grey', ls='--', label=label)
# Passing-Bablok regression line
x = np.array([left, right])
y = a + b * x
ax.plot(x, y, label=f'{b:4.2f}x + {a:4.2f}')
# Passing-Bablok regression line - confidence intervals
x = np.array([left, right])
y_L = a_L + b_L * x
y_U = a_U + b_U * x
label = f'Upper CI: {b_U:4.2f}x + {a_U:4.2f}'
ax.plot(x, y_U, c='tab:blue', alpha=0.2, label=label)
label = f'Lower CI: {b_L:4.2f}x + {a_L:4.2f}'
ax.plot(x, y_L, c='tab:blue', alpha=0.2, label=label)
ax.fill_between(x, y_U, y_L, alpha=0.2)
# Set aspect ratio
ax.set_aspect('equal')
# Legend
ax.legend(frameon=False)
# Show
plt.show()

7 Worked Example: Zaiontz (2025)

Use the following data from the Real Statistics page:

x = np.array([
    347, 249, 369, 286, 329, 410, 267, 295, 500, 286, 271, 506, 117, 329, 132,
    274, 277, 198])
y = np.array([
    371, 283, 373, 341, 353, 454, 214, 230, 510, 295, 286, 453, 114, 328, 109,
    203, 305, 154
])

We’ll do the same process but wrapped into a function:

def passing_bablok(x, y):
    """Perform Passing-Bablok analysis."""
    # Initialise a list of the gradients between each combination of two points
    S = []
    # Iterate over all combinations of two points
    for i in range(len(x) - 1):
        for j in range(i + 1, len(x)):
            y_i, y_j, x_i, x_j = y[i], y[j], x[i], x[j]
            # Ignore identical points
            if (y_i == y_j) and (x_i == x_j):
                continue
            # Avoid division by zero errors
            if x_i == x_j:
                if y_i > y_j:
                    # Set the gradient to positive infinity
                    gradient = np.inf
                    S.append(gradient)
                    continue
                else:
                    # Set the gradient to negative infinity
                    gradient = -np.inf
                    S.append(gradient)
                    continue
            # Calculate the gradient between this pair of points
            gradient = (y_i - y_j) / (x_i - x_j)
            # Ignore any gradient equal to -1
            if gradient == -1:
                continue
            # Add the gradient to the list of gradients
            S.append(gradient)

    # N is the number of elements in S (which might be fewer than the number of
    # raw data points)
    N = len(S)

    # Sort the list of gradients in preparation for taking the median
    S.sort()

    # K is the number of gradients less than -1
    S = np.array(S)
    K = (S < -1).sum()

    # Calculate the shifted median
    if N % 2 != 0:
        # If N is odd
        idx = (N + 1) / 2 + K
        # Convert to an integer and adjust for Python being 0-indexed
        idx = int(idx) - 1
        b = S[idx]
    else:
        # If N is even
        idx = N / 2 + K
        # Convert to an integer and adjust for Python being 0-indexed
        idx = int(idx) - 1
        b = 0.5 * (S[idx] + S[idx + 1])
    a = np.median(y - b * x)

    # Confidence level
    C = 0.95  # 95%
    # Significance level
    gamma = 1 - C  # 0.05
    # Quantile (the cumulative probability; two-tailed)
    q = 1 - (gamma / 2)  # 0.975
    # Critical z-score, calculated using the percent-point function (aka the
    # quantile function) of the normal distribution
    w = st.norm.ppf(q)  # 1.96

    # Sample size
    n = len(x)

    # Intermediate values
    C_gamma = w * np.sqrt((n * (n - 1) * (2 * n + 5)) / 18)
    M_1 = np.round((N - C_gamma) / 2)
    M_2 = N - M_1 + 1

    # Get lower and upper bounds of the confidence interval for the gradient
    # (convert floats to ints and subtract 1 to adjust for 0-indexing)
    b_L = S[int(M_1) + K - 1]
    b_U = S[int(M_2) + K - 1]
    # Get lower and upper bounds of the confidence interval for the y-intercept
    a_L = np.median(y - b_U * x)
    a_U = np.median(y - b_L * x)

    return a, a_L, a_U, b, b_L, b_U


a, a_L, a_U, b, b_L, b_U = passing_bablok(x, y)
print(f'Gradient = {b:6.4f} ({b_L:6.4f} to {b_U:6.4f})')
## Gradient = 1.1274 (0.9198 to 1.4564)
print(f'y-intercept = {a:6.4f} ({a_L:6.4f} to {a_U:6.4f})')
## y-intercept = -33.6179 (-134.3624 to 32.7701)

These values precisely match those from the original page.

8 References


  1. Passing H, Bablok W. A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in Clinical Chemistry, Part I. Journal of Clinical Chemistry and Clinical Biochemistry. 1983;21(11):709–20. DOI: 10.1515/cclm.1983.21.11.709. PMID: 6655447. Available here. Jump to reference: ↩︎

  2. Giavarina D. Understanding Bland Altman analysis. Biochemia Medica. 2015;25(2):141-151. DOI: 10.11613/BM.2015.015. PMID: 26110027. Available here. Jump to reference: ↩︎