M Tech Dissertations
Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3
Browse
Search Results
Item Open Access Shadow Detection and Removal from video using Deep Learning(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Dodiya, Krutika; Khare, Manish; Gohel, BakulThe removal of shadow from images is crucial in computer vision as it can enhancethe interpretability and visual quality of images. This research work proposesa cascade U-Net architecture for the shadow removal, consisting of twostages of U-Net Architecture. In the first stage, a U-Net is trained using theshadow images and their corresponding ground truth to predict the shadow freeimages. The second stage uses the predicted shadow free images and groundtruth as input to another U-Net, which further refines the shadow removal results.This cascade U-Net architecture enables the model to learn and refine theshadow removal progressively, leveraging both the initial predictions and groundtruth.Experimental evaluations on benchmark datasets demonstrate that our approachachieves notably good performance in both qualitative and quantitative evaluations.By using both objective metrics such as Structural Similarity Index(SSIM),and Root mean Square Error (RMSE), and subjective evaluations where humanobservers rate the quality of the shadow removal results, our approach was foundto outperform other state-of-the-art methods. Overall, our proposed cascade UNetarchitecture offers a promising solution for the shadow removal that canimprove image quality and interpretabilityItem Open Access Image Processing Using Digital Programming on FPGA(Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Kachchhi, Hardi; Agrawal, Yash; Khare, ManishImage processing is a way to transform an image into digital form and after thatperform some operations on it that helps to improve images for human interpretationand extract useful information from it. It is essential for a wide range ofapplications. It allows for enhancing and restoring images, extracting featuresfor object recognition, compressing images for efficient storage and transmission,analyzing images for computer vision tasks, enabling medical diagnostics andtreatment, and interpreting data from remote sensing.Field Programmable Gate Array (FPGA) is preferred for image processing dueto their parallel processing capabilities, reconfigurability, low latency, energy efficiency,pipelining support, customization options, real-time processing capabilities,and ease of integration. These advantages make FPGAs a powerful tool forimplementing high-performance and efficient image processing solutions acrossvarious applications.To implement various filters in Image processing, we have developed a methodthat performs various edge detection techniques using FPGAs and displaying theimage on the monitor through Video Graphics Array (VGA)Controller. Edge detectionfilters and blurring filters are an indispensable part of Image processing invarious fields due to their ability to extract information, enhance visual quality,and enable decision-making based on visual data .Item Open Access Image segmentation fusion by edge detection techniques(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Koshti, Nishant; Banerjee, AsimImage Segmentation is one of the basic building blocks in image processing.It is a pre-processing task to make an image processable for further operations such as noise removal,decomposition,morphological operations etc.It is the first step in object identification in an image.It may also be used in compression to compress different areas, segments of an image, at different compression qualities.It differentiates objects in an image from the background of an image.There are different types of segmentation techniques such as color,region growing,split and merge,grayscale,edge detection etc.The technique that should be applied mostly depends on the kind of image given.Segmentation mainly derives the homogeneity of an image.That is it partitions an image into distinct regions that are meant to correlate strongly with objects or features of interest in the image. Segmentation can also be regarded as a process of grouping together pixels that have similar attributes. The level to which the subdivision is carried depends on the problem being solved. That is,segmentation should stop when the objects of interest in an application have been isolated. There is no point in carrying segmentation past the level of detail required to identify those elements.Item Open Access Learning cross domain relations using deep learning(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kotecha, Dhara; Joshi, Manjunath V.The Generative Adversarial Networks (GAN) have achieved exemplary performance in generating realistic images. They also perform image to image translation and produce good results for the same. In this thesis, we explore the use of GAN for performing cross domain image mapping for facial expression transfer. In facial expression transfer, the expressions of source image is transferred on the target image. We use a DiscoGAN (Discovery GAN) model for the task. Using a DiscoGAN, image of the target is generated with the facial features of the source. It uses feature matching loss along with the GAN objective and reconstruction loss. We propose a method to train the DiscoGAN with paired data of source and target images. In order to learn cross domain image mapping, we train the DiscoGAN with a batch size of 1. In our next work, we propose an algorithm to binarize the degraded document images in this thesis. We incorporate U-Net for the task at hand. We model document image binarization as a classification problem wherein we generate an image which is a result of classification of each pixel as text or background. Optimizing the cross entropy loss function, we translate the input degraded image to the corresponding binarized image. Our approach of using U-Net ensures low level feature transfer from the input degraded image to the output binarized image and thus it is better than using a simple convolution neural network. Our method of training leads to the desired results faster when both the degraded document and the ground truth binarized images are available for training and it also generalizes well. The results obtained are significantly better than the state-of-theart techniques and the approach is simpler than other deep learning approaches for document image binarization.Item Open Access Detection and localization of tampering in a digital medical image using discrete wavelet transform(Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Gadhiya, Tushar; Roy, Anil K.; Mitra, Suman K.Use of digital images has increased tremendously in medical science as a diagnosis tool. It made investigation easier and quick. But at the same time it raises the question of authenticity of the image under scrutiny. Authenticity of the digital image has been very important in the areas like scientific research, legal proceedings, lifestyle publications, brand marketing, forensic investigations, government documents etc. With the help of powerful and easy to use image editing software like Microsoft Paint and Photoshop, it became extremely easy to tamper with a digital image for malicious objective. Digital form of the image draws attention of many researcher towards automatic diagnosis system for image analysis and enhancement. These kinds of systems use harmless image manipulation operations like brightness enhancement, gamma correction, contrast enhancement etc. which improve quality of the image. It helps in better diagnosis. So it should not be considered as a tampering. Likely and reported tampering of malicious intention may be found in medical claims, health insurances or even legal battles in which a medical problem may influence the judicial decision. Since use of digital images in medical profession still is in nascent stage, we addressed the likelyto- be-wrong-use of such input in this thesis. We propose an algorithm to enable anybody to detect if or not a tampering is done with such malicious intention. And if it is so, the almost precise localization of such tempering can also be done successfully in a suspect digital medical image. The basis of our proposed algorithm is the hash-based representation of a digital image. We use discrete wavelet transform as a tool. It allows us to identify direction of tampering. The direction of tampering helps us converge on the tampered object in the localization area. We will show that our algorithm is robust against harmless manipulation, sensitive enough for even a minute tampering. In case of multiple tampering, proposed method is able to identify location as well as direction of multiple tampering, while some of the existing methods fail in this area. Our proposed technique is fast and generates smaller hash, as it works with smaller hash function in comparison with the similar available techniques.Item Open Access Object-background segmentation from video(Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Domadiya, Prashant; Mitra, Suman K.Fast and accurate algorithms for background-foreground separation are an essential part ofany video surveillance system. GMM (Gaussian Mixture Models) based object segmentation
methods give accurate results for background-foreground separation problems but are
computationally expensive. In contrast, modeling with only single Gaussian improves the
time complexity with the reduction in the accuracy due to variations in illumination and
dynamic nature of the background. It is observed that these variations affect only a few
pixels in an image. Most of the background pixels are unimodal. We propose a method
to account for the dynamic nature of the background and low lighting conditions. It is an
adaptive approach where each pixel is modeled as either unimodal Gaussian or multimodal
Gaussians. The flexibility in terms of number of Gaussians used to model each pixel, along
with learning when it is required approach reduces the time complexity of the algorithm
significantly. To resolve problems related to false negative due to the homogeneity of color
and texture in foreground and background, a spatial smoothing is carried out by K-means,
which improves the overall accuracy of proposed algorithm. The shadow causes the problem
in many applications which rely on segmentation results. Shadow cause variation in
RGB values of pixels, RGB value dependent GMM based method can’t remove shadow
from detection results. The preprocessing stage involving illumination invariant representation
takes care of the object shadow as well.
Item Open Access Estimating depth from monocular video under varying illumination(Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Sarupuri, Bhuvaneshwari; Tatu, AdityaAbility to perceive depth and reconstruct 3D surface of an image is a basic function of many areas of computer vision. Since 2D image is the projection of 3D scene in two dimension, the information about depth is lost. Many methods were introduced to estimate the depth using single, two or multiple images. But most of the previous work carried out in the area of depth estimation is carried out in the field of stereo-vision. These stereo techniques need two images, a whole setup to acquire them and there are many setbacks in correspondence and hardware implementation. Many cues can be used to model the relation between depth and features to learn depth from a single image using multi-scale Markov Random fields[1]. Here we use Gabor filters to extract texture variation cue and improvise the depth estimate using shape features. This same approach is used for estimating depth from videos by incorporating temporal coherence. In order to do this, optical flow is used and we introduce a novel method of computing optical flow using texture features. Since texture features extract dominant properties from an image which are almost invariant to illumination, the texture based optical flow is robust to large uniform illuminations which has lot of application in outdoor navigation and surveillance.Item Open Access Automatic target image detection for morphing(Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Vyas, Jaladhi; Joshi, Manjunath V.In this thesis,we propose a novel approach for automatic target image detection for morphing based on 3D textons and contrast. Given the source image consisting of human frontal face and training images having human and animal faces our algorithm finds the target image automatically from the target database. There are two major advantages of our approach. It solves the problem of manual selection of target image as done by the researchers in morphing community. By detecting it automatically, one may achieve smooth transition from source to destination. Our algorithm aims at finding the best target animal face image considering human face as a source. A histogram model based on 3D textons and contrast is built and chi-square distance is used between the histogram models of source and target images to find the best target. After detecting the target image, the control points for the source and target image are automatically detected using facial geometry, eye map operator and K-means clustering. The superiority of our algorithm over other methods is that it just needs source image and training database and the entire morphing process is done automatically. The experiments were conducted using four class of images that include human, cheetah, lion and monkey respectively in which human class is used as the source. Our target detection results are verified using Structural Similarity Index (SSIM) measure between source and intermediate morphed image. Experiments on a fairly large dataset have been carried out to show the usefulness and capability of our method.Item Open Access Manifold valued image segmentation(Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Bansal, Sumukh; Tatu, AdityaImage segmentation is the process of partitioning a image into different regions or groups based on some characteristics like color, texture, motion or shape etc. Segmentation is an intermediate process for a large number of applications including object recognition and detection. Active contour is a popular variational model for object segmentation in images, in which the user initializes a contour which evolves in order to optimize an objective function designed such that the desired object boundary is the optimal solution. Recently, imaging modalities that produce Manifold valued images have come up, for example, DT-MRI images, vector fields. The traditional active contour model does not work on such images. In the work presented here we generalize the active contour model to work on Manifold valued images. Since usual gray-scale images are just an specific example of Manifold valued images, our method produce expected results on gray-scale images. As an application of proposed active contour model we we perform texture segmentation on gray-scale images by first creating an appropriate Manifold valued image. We demonstrate segmentation results for manifold valued images and texture images. Diversity of the texture segmentation problem Inspired us to propose a new active contour model for texture segmentation where we find the background/foreground texture regions in a given image by maximizing the geodesic distance between the interior and exterior covariance matrices. We also provide results using proposed method.Item Open Access Locality preserving projection: a study and applications(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Shikkenawis, Gitam; Mitra, Suman KLocality Preserving Projection (LPP) is a recently proposed approach for dimensionality reduction that preserves the neighbourhood information and obtains a subspace that best detects the essential data manifold structure. Currently it is widely used for finding the intrinsic dimensionality of the data which is usually of high dimension. This characteristic of LPP has made it popular among other available dimensionality reduction approaches such as Principal Component Analysis (PCA). A study on LPP reveals that it tries to preserve the information about nearest neighbours of data points, thus may lead to misclassification in the overlapping regions of two or more classes while performing data analysis. It has also been observed that the dimension reducibility capacity of conventional LPP is much less than that of PCA. A new proposal called Extended LPP (ELPP) which amicably resolves two issues mentioned above is introduced. In particular, a new weighing scheme is designed that pays importance to the data points which are at a moderate distance, in addition to the nearest points. This helps to resolve the ambiguity occurring at the overlapping regions as well as increase the reducibility capacity. LPP is used for a variety of applications for reducing the dimensions one of which is Face Recognition. Face Recognition is one of the most widely used biometric technology for person identification. Face images are represented as highdimensional pixel arrays and due to high correlation between the neighbouring pixel values; they often belong to an intrinsically low dimensional manifold. The distribution of data in a high dimensional space is non-uniform and is generally concentrated around some kind of low dimensional structures. Hence, one of the ways of performing Face Recognition is by reducing the dimensionality of the data and finding the subspace of the manifold in which face images reside. Both LPP and ELPP are used for Face and Expression Recognition tasks. As the aim is to separate the clusters in the embedded space, class membership information may add more discriminating power. With this in mind, the proposal is further extended to the supervised version of LPP (SLPP) that uses the known class labels of data points to enhance the discriminating power along with inheriting the properties of ELPP
- «
- 1 (current)
- 2
- 3
- »