For this dataset’s description, see here
For this dataset’s documentation, see here

The wine recognition dataset is loaded using load_wine(). This returns a ‘Bunch’ object which contains both the data itself as well as metadata. By default the data is formatted as NumPy arrays but, by setting the as_frame parameter to True when loading the dataset, this can be changed so as to use Pandas data frames:

from sklearn import datasets

# Load the dataset
wine = datasets.load_wine(as_frame=True)

The data contains results from the chemical analyses of 178 different wines, ie there are 178 samples or instances in the dataset. The wines came from 3 different cultivators in the same region of Italy, and this is the target or class information. There were 13 measurements taken during each analysis, so there are 13 features or attributes. So, when formatted as a data frame, the data consists of 178 rows and 13 + 1 columns (13 features and 1 target). The feature and target data can be extracted separately as two data frames or together in one data frame:

# Extract the feature data only
features = wine['data']

# Extract the target data only
target = wine['target']

# Extract the feature and target data together
df = wine['frame']

print(df.head())

##    alcohol  malic_acid   ash  ...  od280/od315_of_diluted_wines  proline  target
## 0    14.23        1.71  2.43  ...                          3.92   1065.0       0
## 1    13.20        1.78  2.14  ...                          3.40   1050.0       0
## 2    13.16        2.36  2.67  ...                          3.17   1185.0       0
## 3    14.37        1.95  2.50  ...                          3.45   1480.0       0
## 4    13.24        2.59  2.87  ...                          2.93    735.0       0
## 
## [5 rows x 14 columns]

The column names of the first 13 columns are the features names, and these are also available in in a separate feature_names array:

print(wine['feature_names'])

## ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

The column name of the 14th column is target which indicates that this is the target information, ie which cultivator the wine in question came from. These are simply the values 0, 1 and 2:

print(df['target'].unique())

## [0 1 2]

Example Usage

from matplotlib import pyplot as plt
import seaborn as sns

fig, axs = plt.subplots(5, 3, figsize=(8, 10))
for i, ax in enumerate(fig.get_axes()):
    if i < 13:
        feature = wine['feature_names'][i]
        sns.boxplot(df, x='target', y=feature, whis=[0, 100], ax=ax)
        ax.set_title(feature)
        ax.set_ylabel('')
        ax.set_xlabel('')
fig.delaxes(axs[(4, 2)])
fig.delaxes(axs[(4, 1)])
plt.tight_layout()
plt.show()

scikit-learn Toy Datasets in Python:Wine recognition dataset

Example Usage

`scikit-learn` Toy Datasets in Python:
Wine recognition dataset