Segmentation and classification: optical character recognition algorithms to enhance the accuracy
Abstract
OCR (Optical Character Recognition) is a well known algorithm for the conversion of machine printed or handwritten text images to machine-encoded text. As the word "Optical" is related to light or vision, an OCR system takes input of images. These images are light intensities captures by camera/scanner sensors images in the form of matrices. Character recognition is defined as the process by which identification of the characters can be done algorithmically in the automated fashion. In the computer vision community, OCR is an interesting problem for the researchers. Previously, all the proposed methods on OCR were majorly rule based that cannot be generalised on a variety of datasets. With the advancement of the machine learning community, researchers started solving this problem using various supervised classification and deep learning based algorithms. The methods developed for OCR show inferior performance if the input data is noisy and generalised i.e. having multi language in single document or if the text has different fonts/sizes. Commercial OCR engines such as Abbyy [1] SDK and Tesseract [2] are available but accuracy of these engines for digit recognition with underlines and presence of table is inferior. An approach for table detection followed by preprocessing and recognition is proposed. After localizing and detecting table with the help of Faster R-CNN [3] architecture, the cropped region can be further processed with the binary classification (Text or Numeric class) followed by character level segmentation. Segmented characters can be given to CNN for classification of digits and special characters.
Collections
- M Tech Dissertations [923]