Show simple item record

dc.contributor.advisorJoshi, Manjunath V.
dc.contributor.authorPrajapati, Pratik Kamlesh
dc.date.accessioned2020-09-14T05:57:49Z
dc.date.available2020-09-14T05:57:49Z
dc.date.issued2019
dc.identifier.citationPrajapati, Pratik Kamlesh (2019). Optical character recognition (OCR) feature extraction and classification. Dhirubhai Ambani Institute of Information and Communication Technology, 47p. (Acc.No: T00763)
dc.identifier.urihttp://drsr.daiict.ac.in//handle/123456789/828
dc.description.abstractOptical character recognition (OCR) [6] is a process of digitizing an image or document containing text. In the OCR system, we do the classification of optical patterns contained in a digital image corresponding to alphanumeric and special characters. The various important intermediate steps involved in character recognition are pre-processing, segmentation, feature extraction and classification/recognition. In the past, a lot of research has been performed to compare the performance of various OCR approaches such as Support Vector Machine (SVM) [2], Hidden Markov Model (HMM) [7], Feed Forward Neural Networks [8] and Convolutional Neural Networks [9] and even Transfer Learning [3]. We have proposed to use Capsule Network [5] to improve the Optical Character Recognition performance. For this thesis, we are taking up this problem to make it more robust for various type of documents and fonts. Also, we want to overcome erroneous predictions in case of incorrect segmentation of characters. This retains most of the important information in the document which can be used later for various pipeline processes. Our approach makes the manual correction of OCR-ed output as less as possible. The complete numeric value is of more importance and even a single error in the character (digit) will ask for the manual editors to type the complete numeric value again, so predicting the complete block of the numeric value ism very important for us. Keywords: Optical Character Recognition, Pre-processing, Segmentation, Feature Extraction
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectOptical character recognition
dc.subjectpre-processing
dc.subjectsegmentation
dc.subjectfeature extraction
dc.classification.ddc006.424 PRA
dc.titleOptical character recognition (OCR) feature extraction and classification
dc.typeDissertation
dc.degreeM.Tech
dc.student.id201711004
dc.accession.numberT00763


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record