Image Captioning Using Visual And Semantic Attention Mechanism

Patel, Abhikumar

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/1000

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Khare, Manish
dc.contributor.author	Patel, Abhikumar
dc.contributor.other	Kumar, Ahlad
dc.date.accessioned	2022-05-06T05:46:40Z
dc.date.available	2023-02-19T05:46:40Z
dc.date.issued	2021
dc.identifier.citation	Patel, Abhikumar (2021). Image Captioning Using Visual And Semantic Attention Mechanism. Dhirubhai Ambani Institute of Information and Communication Technology. viii, 40 p. (Acc.No: T00940)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/1000
dc.description.abstract	Image captioning is a method of generating captions/descriptions for the image. Image captioning have many applications in various fields like image indexing for content based image retrieval, Self-driving car, for visually impaired persons, in smart surveillance system and many more. It connects two major research communities of computer vision and natural language processing. The main challenges in image captioning are to recognize the important objects, their attributes, and their visual relationships of objects within an image, then it also needs to generate syntactically and semantically correct sentences. Currently, most of the architectures for image captioning are based on the encoder-decoder model, in which the image is first encoded using CNN to get an abstract version of the image then it is decoded using RNN to get proper caption for the image. So finally I have selected one base paper which was based on visual attention on the image to attend the most appropriate region of the image while generating each word for the caption. But they have miss one important factor while generating the caption for the image which was visual relationships between the objects present in the image. So I have decided to add one relationship detector module to that model to consider the relationships between objects. After combining this module with existing show-attend and tell model we get the caption for the image which consider the relationships between object, which ultimately enhance the quality of the caption for the image. I have performed experiments on various publicly available standard datasets like Flickr8k dataset, Flickr30k dataset and MSCOCO dataset.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Encoder
dc.subject	Decoder
dc.subject	Convolutional Neural Network
dc.subject	Recurrent Neural Network
dc.subject	Visual relationship detector
dc.subject	attention mechanism
dc.classification.ddc	006.3 PAT
dc.title	Image Captioning Using Visual And Semantic Attention Mechanism
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201911012
dc.accession.number	T00940
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201911012_Final MTT - Manish Khare.pdf Restricted Access		12.05 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets