Hand Written Digit Recognition Using scikit-learn

 Hand-Written Digit Recognition in Python Using Scikit-learn Library


Introduction:

            Hello Folks, In this Article we'll learn how to use scikit-learn to apply machine learning classification algorithm on the dataset "digits" which is available in sklearn.datasets 

from sklearn import datasets
digits = datasets.load_digits()

The Dataset contains about 1797 Samples and 10 Classes with the dimensionality of 64 i.e each Data point contains 8*8 image of a digit,

Classes

10

Samples per class

~180

Samples total

1797

Dimensionality

64

Features

integers 0-16

Each image data point in the dataset contains the matrix value of specific image.
digits.images[0]

So the output of the above cell will prompt us a matrix value for the value stored at index 0

array([[ 0., 0., 5., 13., 9., 1., 0., 0.], [ 0., 0., 13., 15., 10., 15., 5., 0.], [ 0., 3., 15., 2., 0., 11., 8., 0.], [ 0., 4., 12., 0., 0., 8., 8., 0.], [ 0., 5., 8., 0., 0., 9., 8., 0.], [ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])

Now, Lets plot this matrix by using the matplotlib.pyplot library
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(digits.images[0], cmap=plt.cm.gray_r, interpolation='nearest')

The Output of the above code will plot the above matrix in the 8*8 image,
<matplotlib.image.AxesImage at 0x7f4e18187390>

Convert the whole Dataset into a DataFrame for further convenience of use.
Also we use the PCA i.e. Principle Component Analysis which is used for Dimensionality Reduction also can be used to explain the variance and the co-variance structure of the set of variable through the linear combinations.
Fit the Data into the PCA and scale the data accordingly. Now we have to use train the data with the SVC i.e. Support Vector Classifier which Fits the Data and returns the best fit Hyper plane that divides the data into the specific categories accordingly, and then we can use the Hyper plane to classify or Predict the unseen data.
So, To train our model and check on the accuracy we distribute the whole data in two  parts i.e. 80% for training and remaining 20%  for testing how our data is trained. In the end we get to know that the accuracy of our model is nearly about 87%.
Check on the Video for working of the code, Do Comment for the Suggestions. 
Thank You.
I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Exprience. Thank you www.suvenconsultants.com



Comments

Popular posts from this blog

Performing Analysis of Meteorological Data

SMS SPAM DETECTION USING DEEP LEARNING.