Knearest neighbors
I have recently been very interested towards learning machine learning and have picked up a book to learn more about this topic.
The language used is Python, and NumPy is used for scientific calculations. I find this great, as I am pretty familiar with Python, and this means that I do not really have to spend a lot of time learning a new language.
Machine learning problems can be subdivided into 2 categories:

Supervised Learning – Given a dataset, predict/foretell certain values

Classification: Classify a problem into 1 of a fixed number of classes/types

Regression: Output a value in a given range


Unsupervised Learning – There is no dataset involved here

Clustering: Adjust your data into groups

Density Estimation: Get the probability of your data belonging to a group

The algorithm used for the OCR code is the knearest neighbors algorithm.
The steps involved are:

Get distances of your required value from dataSet members

Sort the distances in ascending order and choose the k smallest distances & corresponding dataset entries

Classify the dataset entries and return the class that contains the majority of the dataset entries.
Given below is a look into an application of the kNearest Neighbors classification algorithm – recognition of handwritten digits (0 – 9)
I was able to achieve an error rate of 0.0124 which is pretty good for a 1^{st} machine learning algo.
The script attached does the following:

Handwritten images are stores in arrays of size 32×32. The array has 0s and 1s as contents

Our classifier accepts an input vector of size 1024. The image is converted into this

Images from the training dataset are taken and the dataSet matrix is created and passed

Images from the test data set are used for testing.
The script is uploaded on Bitbucket