Handwritten numeral recognition using polar histogram of low-level stroke features
Abstract
Optical Character Recognition (OCR) is a technology that converts handwritten
as well as printed documents into digital documents. It is also important for conversion
of PDFs as well as images into an editable and searchable form. In past
few decades, the world has very rapidly moved towards digitization. Considerable
amount of data can be found in PDFs or document images of handwritten
or printed documents. So there is need for conversion of this data to machine encoded
form. This makes search and modification of data simpler. Here comes the
OCR technology into picture.
The thesis focuses on the handwritten numeral recognition of an Indian script,
Gujarati. The proposed method employees the Low-Level Stroke (LLS) for feature
extraction and the polar histogram method for feature vector generation that
enables the reduced sized representation of features. The baseline experiments
were performed using k-nearest neighbor (k-NN) classifier and the result was improved
further using support vector machine (SVM) classifier with radial basis
function (RBF) kernel.
The method of the Polar histogram of LLS features was also tested on Devanagari
and English handwritten numeral datasets. The accuracy of classification for
Gujarati, Devanagari, and English are at par with the state-of-the-art methodologies.
The experiments were also performed for mixed dataset Gujarati-English,
Gujarati-Devanagari, English-Devanagari, and Gujarati-English-Devanagari. In
all experiments, the feature vector size is significantly less while the accuracy is
not compromised much. However, the main contribution of the thesis is evident
from the reduced size of feature vector generated using proposed method for feature
vector generation (Polar histogram).
Collections
- M Tech Dissertations [923]