Text retrieval from the degraded document images

Vasani, Hiral

Text retrieval from the degraded document images

Files

201311042.pdf (1.77 MB)

Date

2015

Authors

Vasani, Hiral

Publisher

Dhirubhai Ambani Institute of Information and Communication Technology

Abstract

Image binarization is used to obtain a black and white text document from a colored one. Basically, it can be taken as an image segmentation task that segments the text part from the background. Such a black and white document can be used in many applications, namely Optical Character Recognition (OCR). Text documents suffer from various types of degradations that make image binarization a challenging task. This thesis presents the work done to design a technique that segments text from the background. In this method, the document image is first darkened in order to enhance the text (foreground) in it. The text image is again processed separately so as to suppress the background. The two images so obtained are combined in such a way that the suppressed background is retained from the last image and enhanced text is used from the first image. Then this pre-processed image is binarized using an existing thresholding technique. The first binarized image is subjected to some post-processing in order to remove unwanted smaller components and other noise. The output image so obtained is compared to the ground truth results using some evaluation parameters. The results of the algorithm are compared to the existing Binarization techniques.

Keywords

Text retrieval, Information retrieval, Document, Text Extraction, Information Retrieval, Techniques

Citation

Vasani, Hiral (2015). Text retrieval from the degraded document images. Dhirubhai Ambani Institute of Information and Communication Technology, vii, 38 p. (Acc.No: T00536)

URI

http://drsr.daiict.ac.in/handle/123456789/573

Collections

M Tech Dissertations

Full item page

Text retrieval from the degraded document images

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By