scikit-learn Toy Datasets in Python: Diabetes dataset

For more information on this dataset:

See here for the user guide
See here for the documentation of the load_diabetes() function which imports this dataset
See here for the ‘homepage’ of this dataset
See here for the original publication

The diabetes dataset contains measurements taken from 442 diabetic patients:

10 baseline variables (features):
- age - age in years
- sex - male or female
- bmi - body mass index
- bp - average blood pressure
- s1 - TC: total serum cholesterol
- s2 - LDL: low-density lipoproteins
- s3 - HDL: high-density lipoproteins
- s4 - TCH: total cholesterol / HDL
- s5 - LTG: possibly log of serum triglycerides level
- s6 - GLU: blood sugar level
One target variable: a quantitative measure of disease progression one year after baseline

Each of the 10 feature variables have been mean centered and scaled by the standard deviation times the square root of the number of sample (ie the sum of squares of each column totals 1).

The dataset can be loaded using load_diabetes() or load_diabetes(as_frame=True). Both return a ‘Bunch’ object which can be indexed as if it were a dictionary with the following being the most important keys:

Key	Value
`DESCR`	Description of the dataset
`feature_names`	Names of the 10 features (the baseline measurements taken)
`data`	The 442 baseline data points, formatted as a 442x10 NumPy array by default or as a 442x10 pandas data frame if `as_frame=True` was used
`target`	The 442 one-year follow-up data points - namely the values for disease progression - formatted as a NumPy array by default or as a pandas series if `as_frame=True` was used

Example usage:

from sklearn import datasets
from matplotlib import pyplot as plt

# Load the dataset
diabetes = datasets.load_diabetes(as_frame=True)

# Don't plot the sex data
features = diabetes['feature_names']
features.remove('sex')

# Plot
fig, axs = plt.subplots(3, 3)
fig.suptitle('Diabetes Dataset')
for i in range(3):
    for j in range(3):
        n = j + i * 3
        feature = features[n]
        axs[i, j].scatter(diabetes['data'][feature], diabetes['target'], s=1)
        axs[i, j].set_xlabel(feature)
        axs[i, j].set_ylabel('target')
plt.tight_layout()
plt.show()

scikit-learn Toy Datasets in Python:Diabetes dataset

`scikit-learn` Toy Datasets in Python:
Diabetes dataset