Text description of image
Abstract
Image comprises of something which is easy for human beings to understand but difficult for a machine to interpret. In this thesis, we propose an algorithm to obtain the textual description of the image content. In order to generate the output for a given image in terms of meaningful sentences that describes the image in the input, we have developed a stepwise procedure to fulfil the task. Problem statement is, given an image as an input, our system automatically generates the text description of the input image as output. Our aim is to understand scenario in an image i.e., describing given image automatically into simple sentences (English language). To accomplish our task four steps are involved
1) Segmentation
2) Recognition
3) Labelling
4) Sentence generation
In first step segmentation is carried out using a novel approach of active contour model to separate the objects and background in the image. In order to separate the objects boundaries to get different regions present in the image first the segmentation is done which is helpful in the second step i.e., object Recognition. The object recognition is task of detecting and identifying objects in the scene depending on the feature vectors extracted from the image regions. We have extracted the features using SIFT (Scale Invariant Feature Transform) due to their invariant properties for recognition of an object. SIFT provides key point descriptors which we have used for labelling the object. In our method we try to recognize occluded and cluttered objects in the image and simultaneously improve segmentation by recognition and vice-a-versa. The next step is labelling the recognized objects i.e., which category the object belongs to and associate a label with it which is useful in next step i.e., generation of sentences. We have used SVM (Support Vector Machine) classifier for classifying the objects. Our final step involves generation and this is accomplished by linking labels by their meanings and form meaningful sentences as an output of our system.
Collections
- M Tech Dissertations [923]