Image captioning and neural architecture search using reinforcement learning
Abstract
With the advent of Deep Learning, problem solving expertise for a machine has exponentially increased. The past decade has experienced much success in the field of deep neural networks in many difficult areas such as image, speech, machine translation and natural language understanding. A primary goal of computer vision is to automatically produce descriptive captions for an image that is fairly close to the essence of scene understanding. Therefore, the image captioning model must be powerful enough to capture the entire content of an image as well as convey their correlation in a common language. Inspired by the challenging task of image captioning, we attempt to solve it using attention mechanism with the help of reinforcement learning as the first part of the thesis. Reinforcement learning (RL) is a machine learning technique dealing with the manner in which a software agent should react to an environment so as to maximise the idea of cumulative reward. This technique best fits for the purpose of decision making. To develop a neural network model, it requires meaningful architecture engineering. One may get it by transfer learning, but to achieve the best possible performance it is usually preferred to design network from scratch which requires specialised skills and is challenging in general. Neural Architecture Search (NAS) is a technique that hunts for the finest neural network architecture. To build a network for the first problem automatically, we attempt to implement NAS using RL on an elementary problem of digit classification as the second part of the work.
Collections
- M Tech Dissertations [923]