scikit-learn
Toy Datasets in Python:For more info, see here:
https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset
The Linnerrud dataset is loaded using the load_linnerud()
function:
from sklearn import datasets
# Load the dataset
linnerud = datasets.load_linnerud()
This returns a ‘Bunch’ object. Usually, it’s more user-friendly to use the as_frame
parameter which causes the ‘data’ and ‘target’ values to be loaded as data frames inside the Bunch object:
# Load the dataset
linnerud = datasets.load_linnerud(as_frame=True)
print(type(linnerud['data']))
## <class 'pandas.core.frame.DataFrame'>
Either way, the Bunch object has the following keys:
# Show the dataset's keys
print(list(linnerud))
## ['data', 'feature_names', 'target', 'target_names', 'frame', 'DESCR', 'data_filename', 'target_filename', 'data_module']
More info on each of the keys:
Key | Description |
---|---|
DESCR |
Description of the dataset |
data_filename |
Location of the CSV file containing the data being imported |
target_filename |
Location of the CSV file containing the target data being imported |
feature_names |
Names of the 3 exercises (Chins, Situps, Jumps) |
data |
The 20 data points for each of the 3 exercises, formatted as a 20x3 array |
target_names |
Names of the 3 physiological variables (Weight, Waist and Pulse) |
target |
The target data, namely the physiological measurements |
# Description of the dataset
print(linnerud['DESCR'])
## .. _linnerrud_dataset:
##
## Linnerrud dataset
## -----------------
##
## **Data Set Characteristics:**
##
## :Number of Instances: 20
## :Number of Attributes: 3
## :Missing Attribute Values: None
##
## The Linnerud dataset is a multi-output regression dataset. It consists of three
## exercise (data) and three physiological (target) variables collected from
## twenty middle-aged men in a fitness club:
##
## - *physiological* - CSV containing 20 observations on 3 physiological variables:
## Weight, Waist and Pulse.
## - *exercise* - CSV containing 20 observations on 3 exercise variables:
## Chins, Situps and Jumps.
##
## .. topic:: References
##
## * Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
## Editions Technic.
# Location of the CSV file containing the data being imported
print(linnerud['data_filename'])
## linnerud_exercise.csv
# Location of the CSV file containing the target data being imported
print(linnerud['target_filename'])
## linnerud_physiological.csv
# Names of the 3 exercises (Chins, Situps, Jumps)
print(linnerud['feature_names'])
## ['Chins', 'Situps', 'Jumps']
# The 20 data points for each of the 3 exercises, formatted as a 20x3 array
print(linnerud['data'])
## Chins Situps Jumps
## 0 5.0 162.0 60.0
## 1 2.0 110.0 60.0
## 2 12.0 101.0 101.0
## 3 12.0 105.0 37.0
## 4 13.0 155.0 58.0
## 5 4.0 101.0 42.0
## 6 8.0 101.0 38.0
## 7 6.0 125.0 40.0
## 8 15.0 200.0 40.0
## 9 17.0 251.0 250.0
## 10 17.0 120.0 38.0
## 11 13.0 210.0 115.0
## 12 14.0 215.0 105.0
## 13 1.0 50.0 50.0
## 14 6.0 70.0 31.0
## 15 12.0 210.0 120.0
## 16 4.0 60.0 25.0
## 17 11.0 230.0 80.0
## 18 15.0 225.0 73.0
## 19 2.0 110.0 43.0
# Names of the 3 physiological variables (Weight, Waist and Pulse)
print(linnerud['target_names'])
## ['Weight', 'Waist', 'Pulse']
# The target data, namely the physiological measurements
print(linnerud['target'])
## Weight Waist Pulse
## 0 191.0 36.0 50.0
## 1 189.0 37.0 52.0
## 2 193.0 38.0 58.0
## 3 162.0 35.0 62.0
## 4 189.0 35.0 46.0
## 5 182.0 36.0 56.0
## 6 211.0 38.0 56.0
## 7 167.0 34.0 60.0
## 8 176.0 31.0 74.0
## 9 154.0 33.0 56.0
## 10 169.0 34.0 50.0
## 11 166.0 33.0 52.0
## 12 154.0 34.0 64.0
## 13 247.0 46.0 50.0
## 14 193.0 36.0 46.0
## 15 202.0 37.0 62.0
## 16 176.0 37.0 54.0
## 17 157.0 32.0 52.0
## 18 156.0 33.0 54.0
## 19 138.0 33.0 68.0
The groups of feature data can be plotted against the target data as follows:
import matplotlib.pyplot as plt
# Plot
fig, axs = plt.subplots(3, 3)
fig.suptitle('Linnerud Dataset')
for i in range(3):
for j in range(3):
axs[i, j].scatter(linnerud['data'].iloc[:, i], linnerud['target'].iloc[:, j])
axs[i, j].set_xlabel(list(linnerud['data'])[i])
axs[i, j].set_ylabel(list(linnerud['target'])[j])
plt.tight_layout()
plt.show()