Handwritten numeral recognition using polar histogram of low-level stroke features
Optical Character Recognition (OCR) is a technology that converts handwritten as well as printed documents into digital documents. It is also important for conversion of PDFs as well as images into an editable and searchable form. In past few decades, the world has very rapidly moved towards digitization. Considerable amount of data can be found in PDFs or document images of handwritten or printed documents. So there is need for conversion of this data to machine encoded form. This makes search and modification of data simpler. Here comes the OCR technology into picture. The thesis focuses on the handwritten numeral recognition of an Indian script, Gujarati. The proposed method employees the Low-Level Stroke (LLS) for feature extraction and the polar histogram method for feature vector generation that enables the reduced sized representation of features. The baseline experiments were performed using k-nearest neighbor (k-NN) classifier and the result was improved further using support vector machine (SVM) classifier with radial basis function (RBF) kernel. The method of the Polar histogram of LLS features was also tested on Devanagari and English handwritten numeral datasets. The accuracy of classification for Gujarati, Devanagari, and English are at par with the state-of-the-art methodologies. The experiments were also performed for mixed dataset Gujarati-English, Gujarati-Devanagari, English-Devanagari, and Gujarati-English-Devanagari. In all experiments, the feature vector size is significantly less while the accuracy is not compromised much. However, the main contribution of the thesis is evident from the reduced size of feature vector generated using proposed method for feature vector generation (Polar histogram).
- M Tech Dissertations