M Tech Dissertations
Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3
Browse
4 results
Search Results
Item Open Access Video captioning(2020) Laheri, Vishal Bharatkumar; Mandal, SrimantaIn recent years, models for video captioning task has been improved very much. Despite advancement, it is still impeded by hardware constraints. Video captioning models takes a sequence of images and caption as inputs, which makes it one of the most memory consuming and computation required task. In this project work, we exploit the importance of required frames from the video to get the desired performance. We also propose the use of a video summarizing model embedded with the captioning model for dynamically selecting frames, which allows the reduction of required frames without losing Spatio-temporal information of the video.Item Open Access Extraction Of Questionnaire Having Yes /No Format From Scanned Images(2020) Neeli, Siva Sravana Kumar; Joshi, ManjunathThe problem at hand is digitization of scanned documents and extract the questionnaire from every page in form a small Image snippets. Do get the required questionnaire from scanned document, first it under goes through OCR and then from that data, we refine image portions using computer vision techniques and algorithms. This process will gives us required region of interest/image snippets. Later we classify them using CNN model.Item Open Access Bag of words (BoW) generation from given features for optimizing feature matching in V-SLAM application(2020) Varshney, Kratika; Mandal, SrimantaThe project aims at generation of Bag of words from given features for optimizing feature matching in V-SLAM application. Simultaneous localization and mapping (SLAM) is the process where an ego vehicle builds a global map of their current environment and uses this map to navigate or deduce its location at any point in time. Visual SLAM is a specific type of SLAM that performs location and mapping functions by leveraging vision based sensors (like monocular or stereo camera) when neither the environment nor the location of the sensor is known. This report illustrates the use of BoW vocabulary for loop detection. A BoW based vocabulary is generated using feature extractor/descriptor by reading a database of images. Also, the report analyses the effect of varying direct index levels on an independent framework known as DLoopDetector. The vocabulary generation and evaluation frameworks are understood, modified for relevant features and experimented upon wrt various parameter choices involved.Item Open Access Estimating depth from monocular video under varying illumination(Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Sarupuri, Bhuvaneshwari; Tatu, AdityaAbility to perceive depth and reconstruct 3D surface of an image is a basic function of many areas of computer vision. Since 2D image is the projection of 3D scene in two dimension, the information about depth is lost. Many methods were introduced to estimate the depth using single, two or multiple images. But most of the previous work carried out in the area of depth estimation is carried out in the field of stereo-vision. These stereo techniques need two images, a whole setup to acquire them and there are many setbacks in correspondence and hardware implementation. Many cues can be used to model the relation between depth and features to learn depth from a single image using multi-scale Markov Random fields[1]. Here we use Gabor filters to extract texture variation cue and improvise the depth estimate using shape features. This same approach is used for estimating depth from videos by incorporating temporal coherence. In order to do this, optical flow is used and we introduce a novel method of computing optical flow using texture features. Since texture features extract dominant properties from an image which are almost invariant to illumination, the texture based optical flow is robust to large uniform illuminations which has lot of application in outdoor navigation and surveillance.