M Tech Dissertations
Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3
Browse
Search Results
Item Open Access Image segmentation fusion by edge detection techniques(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Koshti, Nishant; Banerjee, AsimImage Segmentation is one of the basic building blocks in image processing.It is a pre-processing task to make an image processable for further operations such as noise removal,decomposition,morphological operations etc.It is the first step in object identification in an image.It may also be used in compression to compress different areas, segments of an image, at different compression qualities.It differentiates objects in an image from the background of an image.There are different types of segmentation techniques such as color,region growing,split and merge,grayscale,edge detection etc.The technique that should be applied mostly depends on the kind of image given.Segmentation mainly derives the homogeneity of an image.That is it partitions an image into distinct regions that are meant to correlate strongly with objects or features of interest in the image. Segmentation can also be regarded as a process of grouping together pixels that have similar attributes. The level to which the subdivision is carried depends on the problem being solved. That is,segmentation should stop when the objects of interest in an application have been isolated. There is no point in carrying segmentation past the level of detail required to identify those elements.Item Open Access Learning cross domain relations using deep learning(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Kotecha, Dhara; Joshi, Manjunath V.The Generative Adversarial Networks (GAN) have achieved exemplary performance in generating realistic images. They also perform image to image translation and produce good results for the same. In this thesis, we explore the use of GAN for performing cross domain image mapping for facial expression transfer. In facial expression transfer, the expressions of source image is transferred on the target image. We use a DiscoGAN (Discovery GAN) model for the task. Using a DiscoGAN, image of the target is generated with the facial features of the source. It uses feature matching loss along with the GAN objective and reconstruction loss. We propose a method to train the DiscoGAN with paired data of source and target images. In order to learn cross domain image mapping, we train the DiscoGAN with a batch size of 1. In our next work, we propose an algorithm to binarize the degraded document images in this thesis. We incorporate U-Net for the task at hand. We model document image binarization as a classification problem wherein we generate an image which is a result of classification of each pixel as text or background. Optimizing the cross entropy loss function, we translate the input degraded image to the corresponding binarized image. Our approach of using U-Net ensures low level feature transfer from the input degraded image to the output binarized image and thus it is better than using a simple convolution neural network. Our method of training leads to the desired results faster when both the degraded document and the ground truth binarized images are available for training and it also generalizes well. The results obtained are significantly better than the state-of-theart techniques and the approach is simpler than other deep learning approaches for document image binarization.Item Open Access Detection and localization of tampering in a digital medical image using discrete wavelet transform(Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Gadhiya, Tushar; Roy, Anil K.; Mitra, Suman K.Use of digital images has increased tremendously in medical science as a diagnosis tool. It made investigation easier and quick. But at the same time it raises the question of authenticity of the image under scrutiny. Authenticity of the digital image has been very important in the areas like scientific research, legal proceedings, lifestyle publications, brand marketing, forensic investigations, government documents etc. With the help of powerful and easy to use image editing software like Microsoft Paint and Photoshop, it became extremely easy to tamper with a digital image for malicious objective. Digital form of the image draws attention of many researcher towards automatic diagnosis system for image analysis and enhancement. These kinds of systems use harmless image manipulation operations like brightness enhancement, gamma correction, contrast enhancement etc. which improve quality of the image. It helps in better diagnosis. So it should not be considered as a tampering. Likely and reported tampering of malicious intention may be found in medical claims, health insurances or even legal battles in which a medical problem may influence the judicial decision. Since use of digital images in medical profession still is in nascent stage, we addressed the likelyto- be-wrong-use of such input in this thesis. We propose an algorithm to enable anybody to detect if or not a tampering is done with such malicious intention. And if it is so, the almost precise localization of such tempering can also be done successfully in a suspect digital medical image. The basis of our proposed algorithm is the hash-based representation of a digital image. We use discrete wavelet transform as a tool. It allows us to identify direction of tampering. The direction of tampering helps us converge on the tampered object in the localization area. We will show that our algorithm is robust against harmless manipulation, sensitive enough for even a minute tampering. In case of multiple tampering, proposed method is able to identify location as well as direction of multiple tampering, while some of the existing methods fail in this area. Our proposed technique is fast and generates smaller hash, as it works with smaller hash function in comparison with the similar available techniques.Item Open Access Object-background segmentation from video(Dhirubhai Ambani Institute of Information and Communication Technology, 2015) Domadiya, Prashant; Mitra, Suman K.Fast and accurate algorithms for background-foreground separation are an essential part ofany video surveillance system. GMM (Gaussian Mixture Models) based object segmentation
methods give accurate results for background-foreground separation problems but are
computationally expensive. In contrast, modeling with only single Gaussian improves the
time complexity with the reduction in the accuracy due to variations in illumination and
dynamic nature of the background. It is observed that these variations affect only a few
pixels in an image. Most of the background pixels are unimodal. We propose a method
to account for the dynamic nature of the background and low lighting conditions. It is an
adaptive approach where each pixel is modeled as either unimodal Gaussian or multimodal
Gaussians. The flexibility in terms of number of Gaussians used to model each pixel, along
with learning when it is required approach reduces the time complexity of the algorithm
significantly. To resolve problems related to false negative due to the homogeneity of color
and texture in foreground and background, a spatial smoothing is carried out by K-means,
which improves the overall accuracy of proposed algorithm. The shadow causes the problem
in many applications which rely on segmentation results. Shadow cause variation in
RGB values of pixels, RGB value dependent GMM based method can’t remove shadow
from detection results. The preprocessing stage involving illumination invariant representation
takes care of the object shadow as well.
Item Open Access Estimating depth from monocular video under varying illumination(Dhirubhai Ambani Institute of Information and Communication Technology, 2014) Sarupuri, Bhuvaneshwari; Tatu, AdityaAbility to perceive depth and reconstruct 3D surface of an image is a basic function of many areas of computer vision. Since 2D image is the projection of 3D scene in two dimension, the information about depth is lost. Many methods were introduced to estimate the depth using single, two or multiple images. But most of the previous work carried out in the area of depth estimation is carried out in the field of stereo-vision. These stereo techniques need two images, a whole setup to acquire them and there are many setbacks in correspondence and hardware implementation. Many cues can be used to model the relation between depth and features to learn depth from a single image using multi-scale Markov Random fields[1]. Here we use Gabor filters to extract texture variation cue and improvise the depth estimate using shape features. This same approach is used for estimating depth from videos by incorporating temporal coherence. In order to do this, optical flow is used and we introduce a novel method of computing optical flow using texture features. Since texture features extract dominant properties from an image which are almost invariant to illumination, the texture based optical flow is robust to large uniform illuminations which has lot of application in outdoor navigation and surveillance.Item Open Access Automatic target image detection for morphing(Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Vyas, Jaladhi; Joshi, Manjunath V.In this thesis,we propose a novel approach for automatic target image detection for morphing based on 3D textons and contrast. Given the source image consisting of human frontal face and training images having human and animal faces our algorithm finds the target image automatically from the target database. There are two major advantages of our approach. It solves the problem of manual selection of target image as done by the researchers in morphing community. By detecting it automatically, one may achieve smooth transition from source to destination. Our algorithm aims at finding the best target animal face image considering human face as a source. A histogram model based on 3D textons and contrast is built and chi-square distance is used between the histogram models of source and target images to find the best target. After detecting the target image, the control points for the source and target image are automatically detected using facial geometry, eye map operator and K-means clustering. The superiority of our algorithm over other methods is that it just needs source image and training database and the entire morphing process is done automatically. The experiments were conducted using four class of images that include human, cheetah, lion and monkey respectively in which human class is used as the source. Our target detection results are verified using Structural Similarity Index (SSIM) measure between source and intermediate morphed image. Experiments on a fairly large dataset have been carried out to show the usefulness and capability of our method.Item Open Access Manifold valued image segmentation(Dhirubhai Ambani Institute of Information and Communication Technology, 2013) Bansal, Sumukh; Tatu, AdityaImage segmentation is the process of partitioning a image into different regions or groups based on some characteristics like color, texture, motion or shape etc. Segmentation is an intermediate process for a large number of applications including object recognition and detection. Active contour is a popular variational model for object segmentation in images, in which the user initializes a contour which evolves in order to optimize an objective function designed such that the desired object boundary is the optimal solution. Recently, imaging modalities that produce Manifold valued images have come up, for example, DT-MRI images, vector fields. The traditional active contour model does not work on such images. In the work presented here we generalize the active contour model to work on Manifold valued images. Since usual gray-scale images are just an specific example of Manifold valued images, our method produce expected results on gray-scale images. As an application of proposed active contour model we we perform texture segmentation on gray-scale images by first creating an appropriate Manifold valued image. We demonstrate segmentation results for manifold valued images and texture images. Diversity of the texture segmentation problem Inspired us to propose a new active contour model for texture segmentation where we find the background/foreground texture regions in a given image by maximizing the geodesic distance between the interior and exterior covariance matrices. We also provide results using proposed method.Item Open Access Locality preserving projection: a study and applications(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Shikkenawis, Gitam; Mitra, Suman KLocality Preserving Projection (LPP) is a recently proposed approach for dimensionality reduction that preserves the neighbourhood information and obtains a subspace that best detects the essential data manifold structure. Currently it is widely used for finding the intrinsic dimensionality of the data which is usually of high dimension. This characteristic of LPP has made it popular among other available dimensionality reduction approaches such as Principal Component Analysis (PCA). A study on LPP reveals that it tries to preserve the information about nearest neighbours of data points, thus may lead to misclassification in the overlapping regions of two or more classes while performing data analysis. It has also been observed that the dimension reducibility capacity of conventional LPP is much less than that of PCA. A new proposal called Extended LPP (ELPP) which amicably resolves two issues mentioned above is introduced. In particular, a new weighing scheme is designed that pays importance to the data points which are at a moderate distance, in addition to the nearest points. This helps to resolve the ambiguity occurring at the overlapping regions as well as increase the reducibility capacity. LPP is used for a variety of applications for reducing the dimensions one of which is Face Recognition. Face Recognition is one of the most widely used biometric technology for person identification. Face images are represented as highdimensional pixel arrays and due to high correlation between the neighbouring pixel values; they often belong to an intrinsically low dimensional manifold. The distribution of data in a high dimensional space is non-uniform and is generally concentrated around some kind of low dimensional structures. Hence, one of the ways of performing Face Recognition is by reducing the dimensionality of the data and finding the subspace of the manifold in which face images reside. Both LPP and ELPP are used for Face and Expression Recognition tasks. As the aim is to separate the clusters in the embedded space, class membership information may add more discriminating power. With this in mind, the proposal is further extended to the supervised version of LPP (SLPP) that uses the known class labels of data points to enhance the discriminating power along with inheriting the properties of ELPPItem Open Access Fingerprint image preprocessing for robust recognition(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Munshi, Paridhi; Mitra, Suman KFingerprint is the oldest and most widely used form of biometric identification. Since they are mainly used in forensic science, accuracy in the fingerprint identification is highly important. This accuracy is dependent on the quality of image. Most of the fingerprint identification systems are based on minutiae matching and a critical step in correct matching of fingerprint minutiae is to reliably extract minutiae from the fingerprint images. However, fingerprint images may not be of good quality. They may be degraded and corrupted due to variations in skin, pressure and impression conditions. Most of the feature extraction algorithms work on binary images instead of the gray scale image and results of the feature extraction depends upon the quality of binary image used. Keeping these points in mind, image preprocessing including enhancement and binarization is proposed in this work. This preprocessing is employed prior to minutiae extraction to obtain a more reliable estimation of minutiae locations and hence to get a robust matching performance. In this dissertation, we give an introduction to the ngerprint structure and identification system . A discussion on the proposed methodology and implementation of technique for fingerprint image enhancement is given. Then a rough-set based method for binarization is proposed followed by the discussion on the methods for minutiae extraction. Experiments are conducted on real fingerprint images to evaluate the performance of the implemented techniques.Item Open Access Multiresolution fusion using compressive sensing and graph cuts(Dhirubhai Ambani Institute of Information and Communication Technology, 2012) Harikumar, V.; Joshi, Manjunath V.Multiresolution fusion refers to the enhancement of low spatial resolution (LR) of Multispectral (MS) images to that of Panchromatic (Pan ) image without compro- mising on the spectral details. Many of the present day methods for multiresolution fusion require that the Pan and MS images are registered. In this thesis we propose a new approach for multiresolution fusion which is based on the theory of compressive sensing and graph cuts. We rst estimate a close approximation to the fused image by using the sparseness in the given Pan and MS images. Assuming that the Pan and LR MS image have the same sparseness, the initial estimate of the fused image is obtained as the linear combination of the Pan blocks. The weights in the linear combination are estimated using the l1 minimization by making use of MS and the down sampled Pan image. The nal solution is obtained by using a model based approach. The low resolution MS image is modeled as the degraded and noisy version of the fused image in which the degradation matrix entries are estimated by using the initial estimate and the MS image. Since the MS fusion is an ill-posed inverse problem, we use a regularization based approach to obtain the nal solution. We use the truncated quadratic prior for the preservation of the discontinuities in the fused image. A suitable energy function is then formed which consists of data tting term and the prior term and is minimized using a graph cuts based approach in order to obtain the fused image. The advantage of the proposed method is that it does not require the registration of Pan and MS data. Also the spectral characteristics are well preserved in the fused image since we are not directly operating on the Pan digital numbers. Effectiveness of the proposed method is illustrated by conducting experiments on synthetic as well as on real satellite images. Quantitative comparison of the proposed method in terms of Erreur Relative Globale Adimensionnelle de Synthse (ERGAS), Correlation Coecient(CC) , Relative Average Spectral Error(RASE) and Spectral Aangle Mapper(SAM) with the state of the art approaches indicate superiority of our approach
- «
- 1 (current)
- 2
- 3
- »