In Python, the intraclass correlation coefficient (ICC) can be calculated using the intraclass_corr()
function from the pingouin
library. This function’s documentation tells us that:
The intraclass correlation assesses the reliability of ratings by comparing the variability of different ratings of the same subject to the total variation across all ratings and all subjects
…and the Wikipedia page says that:
The intraclass correlation coefficient (ICC) is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other.
Note also that an interclass correlation coefficient is a thing that exists; it is similar but different.
As mentioned, the pingouin
library will be used to calculate the ICC, and the pandas
library will also be needed. These can be installed from the terminal with:
python3.11 -m pip install pingouin
python3.11 -m pip install pandas
After this they can be imported into Python scripts with:
import pingouin as pg
import pandas as pd
This example comes from the Real Statistics site, although it has also been included in Pingouin as a built-in example.
Let’s imagine that there are four judges each tasting 8 different types of wine and rating them from 0 to 9. The results of their assessments have been included in Pingouin, so there is a function to import this raw data directly:
data = pg.read_dataset('icc')
print(data)
## Wine Judge Scores
## 0 1 A 1
## 1 2 A 1
## 2 3 A 3
## 3 4 A 6
## 4 5 A 6
## 5 6 A 7
## 6 7 A 8
## 7 8 A 9
## 8 1 B 2
## 9 2 B 3
## 10 3 B 8
## 11 4 B 4
## 12 5 B 5
## 13 6 B 5
## 14 7 B 7
## 15 8 B 9
## 16 1 C 0
## 17 2 C 3
## 18 3 C 1
## 19 4 C 3
## 20 5 C 5
## 21 6 C 6
## 22 7 C 7
## 23 8 C 9
## 24 1 D 1
## 25 2 D 2
## 26 3 D 4
## 27 4 D 3
## 28 5 D 6
## 29 6 D 2
## 30 7 D 9
## 31 8 D 8
Pivotting this data table will make it more readable, although it’s actually more useable when it’s in the original un-pivotted format (or ‘long’ format) so we won’t assign the pivotted table to a new variable:
print(pd.pivot_table(data, index='Judge', columns='Wine').T)
## Judge A B C D
## Wine
## Scores 1 1 2 0 1
## 2 1 3 3 2
## 3 3 8 1 4
## 4 6 4 3 3
## 5 6 5 5 6
## 6 7 5 6 2
## 7 8 7 7 9
## 8 9 9 9 8
The above table matches the one given in the original example, so we can be sure we’re starting from the right place with this worked example.
In order to use the intraclass_corr()
function we need to give it four inputs:
data
- the input dataframe in long format (ie un-pivotted)targets
- the name of the column in data
that contains the names of the things being ratedraters
- the name of the column in data
that contains the names of the things doing the ratingratings
- the name of the column in data
that contains the values of the ratingsThe first of these is a dataframe and the other three are strings (as they are column names). In our example, the things being rated are Wines, the raters are the Judges and the ratings are the Scores, so here’s how to calculate the ICC:
results = pg.intraclass_corr(data=data, targets='Wine', raters='Judge', ratings='Scores')
# Pandas display options
pd.set_option('display.max_columns', 8)
pd.set_option('display.width', 200)
# Show results
print(results)
## Type Description ICC F df1 df2 pval CI95%
## 0 ICC1 Single raters absolute 0.727521 11.680026 7 24 0.000002 [0.43, 0.93]
## 1 ICC2 Single random raters 0.727689 11.786693 7 21 0.000005 [0.43, 0.93]
## 2 ICC3 Single fixed raters 0.729487 11.786693 7 21 0.000005 [0.43, 0.93]
## 3 ICC1k Average raters absolute 0.914384 11.680026 7 24 0.000002 [0.75, 0.98]
## 4 ICC2k Average random raters 0.914450 11.786693 7 21 0.000005 [0.75, 0.98]
## 5 ICC3k Average fixed raters 0.915159 11.786693 7 21 0.000005 [0.75, 0.98]
This output is very verbose! You get a whole table when you probably only want one number. Here’s how to choose which of the six ICC values you want:
The different types of ICC models are detailed briefly on the Wikipedia page, but here’s a summary:
For this example the Real Statistics page uses ICC2 (single random raters) which is correct for this example: a group of four judges does not represent the entire population of people who could rate wine, each judge tasted all 8 wines and we want to know the reliability of the raters as individuals:
results = results.set_index('Description')
icc = results.loc['Single random raters', 'ICC']
print(icc.round(3))
## 0.728
This is the same value as in the original example.
The function also gives the 95% confidence interval:
lower_ci = results.loc['Single random raters', 'CI95%'][0]
upper_ci = results.loc['Single random raters', 'CI95%'][1]
print(f'ICC = {icc:.3f}, 95% CI [{lower_ci}, {upper_ci}]')
## ICC = 0.728, 95% CI [0.43, 0.93]
The source code for this function might be useful if you want to take a look at how exactly it works. That can be found over here.
Again we can look at the Wikipedia page for help, as it gives this guide for interpreting the ICC (re-produced from Cicchetti1):
Inter-rater agreement | Intraclass correlation |
---|---|
Poor | Less than 0.40 |
Fair | Between 0.40 and 0.59 |
Good | Between 0.60 and 0.74 |
Excellent | Between 0.75 and 1.00 |
…plus this alternative one from Koo and Li2:
Inter-rater agreement | Intraclass correlation |
---|---|
Poor | Less than 0.50 |
Moderate | Between 0.50 and 0.75 |
Good | Between 0.75 and 0.90 |
Excellent | Between 0.90 and 1.00 |
These can be coded up into functions as follows:
def interpret_icc_cicchetti(icc):
"""Interpret the inter-rater agreement."""
if icc < 0.4:
return 'poor'
elif icc < 0.60:
return 'fair'
elif icc < 0.75:
return 'good'
elif icc <= 1:
return 'excellent'
else:
raise ValueError(f'Invalid value for the ICC: {icc}')
def interpret_icc_koo_li(icc):
"""Interpret the inter-rater agreement."""
if icc < 0.5:
return 'poor'
elif icc < 0.75:
return 'moderate'
elif icc < 0.9:
return 'good'
elif icc <= 1:
return 'excellent'
else:
raise ValueError(f'Invalid value for the ICC: {icc}')
Our result of 0.728 can now be interpreted automatically:
icc = results.loc['Single random raters', 'ICC']
agreement = interpret_icc_cicchetti(icc)
print(f"An inter-rater agreement of {icc.round(3)} is {agreement}")
## An inter-rater agreement of 0.728 is good
icc = results.loc['Single random raters', 'ICC']
agreement = interpret_icc_koo_li(icc)
print(f"An inter-rater agreement of {icc.round(3)} is {agreement}")
## An inter-rater agreement of 0.728 is moderate
If you are only interested in the agreement amongst a subset of the raters you can filter the dataset accordingly. Here’s the agreement between Judges A and B:
data = data[data['Judge'].isin(['A', 'B'])]
results = pg.intraclass_corr(data, 'Wine', 'Judge', 'Scores')
results = results.set_index('Type')
icc = results.loc['ICC1', 'ICC']
print(icc.round(3))
## 0.671
Often you will have raw data that is in wide format:
dct = {
'Judge A': [1, 1, 3, 6, 6, 7, 8, 9],
'Judge B': [2, 3, 8, 4, 5, 5, 7, 9],
'Judge C': [0, 3, 1, 3, 5, 6, 7, 9],
'Judge D': [1, 2, 4, 3, 6, 2, 9, 8],
}
df = pd.DataFrame(dct)
print(df)
## Judge A Judge B Judge C Judge D
## 0 1 2 0 1
## 1 1 3 3 2
## 2 3 8 1 4
## 3 6 4 3 3
## 4 6 5 5 6
## 5 7 5 6 2
## 6 8 7 7 9
## 7 9 9 9 8
It will need to be converted into long format before intraclass_corr()
can be used. This can be done by creating a new column that will form the targets and then converting to long-format with the melt()
function from Pandas:
df['index'] = df.index
df = pd.melt(df, id_vars=['index'], value_vars=list(df)[:-1])
print(df)
## index variable value
## 0 0 Judge A 1
## 1 1 Judge A 1
## 2 2 Judge A 3
## 3 3 Judge A 6
## 4 4 Judge A 6
## 5 5 Judge A 7
## 6 6 Judge A 8
## 7 7 Judge A 9
## 8 0 Judge B 2
## 9 1 Judge B 3
## 10 2 Judge B 8
## 11 3 Judge B 4
## 12 4 Judge B 5
## 13 5 Judge B 5
## 14 6 Judge B 7
## 15 7 Judge B 9
## 16 0 Judge C 0
## 17 1 Judge C 3
## 18 2 Judge C 1
## 19 3 Judge C 3
## 20 4 Judge C 5
## 21 5 Judge C 6
## 22 6 Judge C 7
## 23 7 Judge C 9
## 24 0 Judge D 1
## 25 1 Judge D 2
## 26 2 Judge D 4
## 27 3 Judge D 3
## 28 4 Judge D 6
## 29 5 Judge D 2
## 30 6 Judge D 9
## 31 7 Judge D 8
The ICC can then be calculated as per normal:
results = pg.intraclass_corr(df, 'index', 'variable', 'value')
results = results.set_index('Description')
icc = results.loc['Single random raters', 'ICC']
print(icc.round(3))
## 0.728