scikit-learn
Toy Datasets in Python:For more info, see here:
https://scikit-learn.org/stable/datasets/toy_dataset.html#optical-recognition-of-handwritten-digits-dataset
The optical recognition of handwritten digits dataset is loaded using load_digits()
. This returns a ‘Bunch’ object with the following keys:
Key | Description |
---|---|
DESCR |
Description of the dataset |
images |
1797 8x8 images represented as 8x8 arrays of integers from 0 to 16 |
data |
1797 8x8 images represented as 1x64 arrays of integers from 0 to 16 |
target_names |
Names of the target data (ie the numerals from 0 to 9) |
target |
The target data (ie the 1797 numerals that are shown in the images) |
from sklearn.datasets import load_digits
# Load the dataset
digits = load_digits()
# Show the dataset's keys
print(list(digits))
## ['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR']
# Description of the dataset
print(digits['DESCR'])
## .. _digits_dataset:
##
## Optical recognition of handwritten digits dataset
## --------------------------------------------------
##
## **Data Set Characteristics:**
##
## :Number of Instances: 1797
## :Number of Attributes: 64
## :Attribute Information: 8x8 image of integer pixels in the range 0..16.
## :Missing Attribute Values: None
## :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
## :Date: July; 1998
##
## This is a copy of the test set of the UCI ML hand-written digits datasets
## https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
##
## The data set contains images of hand-written digits: 10 classes where
## each class refers to a digit.
##
## Preprocessing programs made available by NIST were used to extract
## normalized bitmaps of handwritten digits from a preprinted form. From a
## total of 43 people, 30 contributed to the training set and different 13
## to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
## 4x4 and the number of on pixels are counted in each block. This generates
## an input matrix of 8x8 where each element is an integer in the range
## 0..16. This reduces dimensionality and gives invariance to small
## distortions.
##
## For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
## T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
## L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
## 1994.
##
## .. topic:: References
##
## - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their
## Applications to Handwritten Digit Recognition, MSc Thesis, Institute of
## Graduate Studies in Science and Engineering, Bogazici University.
## - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
## - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.
## Linear dimensionalityreduction using relevance weighted LDA. School of
## Electrical and Electronic Engineering Nanyang Technological University.
## 2005.
## - Claudio Gentile. A New Approximate Maximal Margin Classification
## Algorithm. NIPS. 2000.
# Two of the 1797 8x8 images represented as 8x8 arrays of integers from 0 to 16
print(digits['images'][:2])
## [[[ 0. 0. 5. 13. 9. 1. 0. 0.]
## [ 0. 0. 13. 15. 10. 15. 5. 0.]
## [ 0. 3. 15. 2. 0. 11. 8. 0.]
## [ 0. 4. 12. 0. 0. 8. 8. 0.]
## [ 0. 5. 8. 0. 0. 9. 8. 0.]
## [ 0. 4. 11. 0. 1. 12. 7. 0.]
## [ 0. 2. 14. 5. 10. 12. 0. 0.]
## [ 0. 0. 6. 13. 10. 0. 0. 0.]]
##
## [[ 0. 0. 0. 12. 13. 5. 0. 0.]
## [ 0. 0. 0. 11. 16. 9. 0. 0.]
## [ 0. 0. 3. 15. 16. 6. 0. 0.]
## [ 0. 7. 15. 16. 16. 2. 0. 0.]
## [ 0. 0. 1. 16. 16. 3. 0. 0.]
## [ 0. 0. 1. 16. 16. 6. 0. 0.]
## [ 0. 0. 1. 16. 16. 6. 0. 0.]
## [ 0. 0. 0. 11. 16. 10. 0. 0.]]]
# Two of the 1797 8x8 images represented as 1x64 arrays of integers from 0 to 16
print(digits['data'][:2])
## [[ 0. 0. 5. 13. 9. 1. 0. 0. 0. 0. 13. 15. 10. 15. 5. 0. 0. 3.
## 15. 2. 0. 11. 8. 0. 0. 4. 12. 0. 0. 8. 8. 0. 0. 5. 8. 0.
## 0. 9. 8. 0. 0. 4. 11. 0. 1. 12. 7. 0. 0. 2. 14. 5. 10. 12.
## 0. 0. 0. 0. 6. 13. 10. 0. 0. 0.]
## [ 0. 0. 0. 12. 13. 5. 0. 0. 0. 0. 0. 11. 16. 9. 0. 0. 0. 0.
## 3. 15. 16. 6. 0. 0. 0. 7. 15. 16. 16. 2. 0. 0. 0. 0. 1. 16.
## 16. 3. 0. 0. 0. 0. 1. 16. 16. 6. 0. 0. 0. 0. 1. 16. 16. 6.
## 0. 0. 0. 0. 0. 11. 16. 10. 0. 0.]]
# Names of the target data (ie the numerals from 0 to 9)
print(digits['target_names'])
## [0 1 2 3 4 5 6 7 8 9]
# The target data (ie the 1797 numerals that are shown in the images)
print(digits['target'])
## [0 1 2 ... 8 9 8]
The arrays of numbers representing the images of the handwritten digits can be viewed - as images - as follows:
import matplotlib.pyplot as plt
imgplot = plt.imshow(digits['images'][1])
plt.show()
# What digit is being displayed?
print(digits['target'][1])
## 1
imgplot = plt.imshow(digits['images'][150])
plt.show()
# What digit is being displayed?
print(digits['target'][150])
## 0
If the array is coming from the data
column, it first needs to be re-shaped into an 8x8 configuration:
imgplot = plt.imshow(digits['data'][2].reshape((8, 8)))
plt.show()
# What digit is being displayed?
print(digits['target'][2])
## 2
imgplot = plt.imshow(digits['data'][100].reshape((8, 8)))
plt.show()
# What digit is being displayed?
print(digits['target'][100])
## 4