Word segmentation and detection for Gujarati handwritten documents
Abstract
In this fast-evolving world, documents in numerous regional languages are finding a prominent place on the internet. That is evident from the increasing use of regional languages on hoardings of advertisements, boards of various stalls and shops, and even essential government instructions are found in regional languages. With the growing reach of the internet, the least privileged are also getting an opportunity to explore the world. Hence, more than a technological requirement, it has become a moral responsibility to put to the test research at the grass root levels in serving the ones who have remained aloof for a while. Segmenting and detecting words is the first and necessary stepping stone in a text recognition task. Hence this work is a preliminary step in exploring the efficiency of various conventional techniques like morphology operations, connected components analysis, finding contours, and deep learning techniques like EfficientDet, Yolo, and Faster R-CNN in segmenting and detecting Gujarati handwritten words from scanned documents collected manually and annotated using Labelimg tool. The conventional method serves a purpose as a pre-processing step in document annotation. The annotation is very prolix, monotonous, and time occupying task, hence our conventional method can automate the annotation process to some level. The user is only required to correct the errors afterward. The results obtained from the collected Gujarati data by performing various state-ofart methods are encouraging. EfficientDet Neural Network architecture renders better performance than other deep learning techniques like YOLO, and Faster RCNN experimented with the same dataset.
Collections
- M Tech Dissertations [923]