For more information on this dataset:

The diabetes dataset contains measurements taken from 442 diabetic patients:

Each of the 10 feature variables have been mean centered and scaled by the standard deviation times the square root of the number of sample (ie the sum of squares of each column totals 1).

The dataset can be loaded using load_diabetes() or load_diabetes(as_frame=True). Both return a ‘Bunch’ object which can be indexed as if it were a dictionary with the following being the most important keys:

Key Value
DESCR Description of the dataset
feature_names Names of the 10 features (the baseline measurements taken)
data The 442 baseline data points, formatted as a 442x10 NumPy array by default or as a 442x10 pandas data frame if as_frame=True was used
target The 442 one-year follow-up data points - namely the values for disease progression - formatted as a NumPy array by default or as a pandas series if as_frame=True was used

Example usage:

from sklearn import datasets
from matplotlib import pyplot as plt

# Load the dataset
diabetes = datasets.load_diabetes(as_frame=True)

# Don't plot the sex data
features = diabetes['feature_names']
features.remove('sex')

# Plot
fig, axs = plt.subplots(3, 3)
fig.suptitle('Diabetes Dataset')
for i in range(3):
    for j in range(3):
        n = j + i * 3
        feature = features[n]
        axs[i, j].scatter(diabetes['data'][feature], diabetes['target'], s=1)
        axs[i, j].set_xlabel(feature)
        axs[i, j].set_ylabel('target')
plt.tight_layout()
plt.show()