Hand-Written Digit Recognition in Python Using Scikit-learn Library

Introduction:

Hello Folks, In this Article we'll learn how to use scikit-learn to apply machine learning classification algorithm on the dataset "digits" which is available in sklearn.datasets

from sklearn import datasets
digits = datasets.load_digits()

The Dataset contains about 1797 Samples and 10 Classes with the dimensionality of 64 i.e each Data point contains 8*8 image of a digit,

Classes	10
Samples per class	~180
Samples total	1797
Dimensionality	64
Features	integers 0-16

Each image data point in the dataset contains the matrix value of specific image.

digits.images[0]

So the output of the above cell will prompt us a matrix value for the value stored at index 0

array([[ 0., 0., 5., 13., 9., 1., 0., 0.], [ 0., 0., 13., 15., 10., 15., 5., 0.], [ 0., 3., 15., 2., 0., 11., 8., 0.], [ 0., 4., 12., 0., 0., 8., 8., 0.], [ 0., 5., 8., 0., 0., 9., 8., 0.], [ 0., 4., 11., 0., 1., 12., 7., 0.],

[ 0., 2., 14., 5., 10., 12., 0., 0.],

[ 0., 0., 6., 13., 10., 0., 0., 0.]])

Now, Lets plot this matrix by using the matplotlib.pyplot library

import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(digits.images[0], cmap=plt.cm.gray_r, interpolation='nearest')

The Output of the above code will plot the above matrix in the 8*8 image,

<matplotlib.image.AxesImage at 0x7f4e18187390>

Convert the whole Dataset into a DataFrame for further convenience of use.

Also we use the PCA i.e. Principle Component Analysis which is used for Dimensionality Reduction also can be used to explain the variance and the co-variance structure of the set of variable through the linear combinations.

Fit the Data into the PCA and scale the data accordingly. Now we have to use train the data with the SVC i.e. Support Vector Classifier which Fits the Data and returns the best fit Hyper plane that divides the data into the specific categories accordingly, and then we can use the Hyper plane to classify or Predict the unseen data.

So, To train our model and check on the accuracy we distribute the whole data in two parts i.e. 80% for training and remaining 20% for testing how our data is trained. In the end we get to know that the accuracy of our model is nearly about 87%.

Check on the Video for working of the code, Do Comment for the Suggestions.

Thank You.

I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com

Search This Blog

Data Analytics Using Python

Hand Written Digit Recognition Using scikit-learn

Hand-Written Digit Recognition in Python Using Scikit-learn Library

Introduction:

Comments

Post a Comment

Popular posts from this blog

Performing Analysis of Meteorological Data

SMS SPAM DETECTION USING DEEP LEARNING.