Text retrieval from the degraded document images
Abstract
Image binarization is used to obtain a black and white text document from a colored
one. Basically, it can be taken as an image segmentation task that segments
the text part from the background. Such a black and white document can be used
in many applications, namely Optical Character Recognition (OCR). Text documents
suffer from various types of degradations that make image binarization a
challenging task.
This thesis presents the work done to design a technique that segments text
from the background. In this method, the document image is first darkened in
order to enhance the text (foreground) in it. The text image is again processed
separately so as to suppress the background. The two images so obtained are
combined in such a way that the suppressed background is retained from the last
image and enhanced text is used from the first image. Then this pre-processed
image is binarized using an existing thresholding technique. The first binarized
image is subjected to some post-processing in order to remove unwanted smaller
components and other noise. The output image so obtained is compared to the
ground truth results using some evaluation parameters. The results of the algorithm
are compared to the existing Binarization techniques.
Collections
- M Tech Dissertations [923]