Learning cross domain relations using deep learning
Abstract
The Generative Adversarial Networks (GAN) have achieved exemplary performance
in generating realistic images. They also perform image to image translation
and produce good results for the same. In this thesis, we explore the use of
GAN for performing cross domain image mapping for facial expression transfer.
In facial expression transfer, the expressions of source image is transferred on the
target image. We use a DiscoGAN (Discovery GAN) model for the task. Using a
DiscoGAN, image of the target is generated with the facial features of the source.
It uses feature matching loss along with the GAN objective and reconstruction
loss. We propose a method to train the DiscoGAN with paired data of source
and target images. In order to learn cross domain image mapping, we train the
DiscoGAN with a batch size of 1.
In our next work, we propose an algorithm to binarize the degraded document
images in this thesis. We incorporate U-Net for the task at hand. We model
document image binarization as a classification problem wherein we generate an
image which is a result of classification of each pixel as text or background. Optimizing
the cross entropy loss function, we translate the input degraded image
to the corresponding binarized image. Our approach of using U-Net ensures low
level feature transfer from the input degraded image to the output binarized image
and thus it is better than using a simple convolution neural network. Our
method of training leads to the desired results faster when both the degraded document
and the ground truth binarized images are available for training and it also
generalizes well. The results obtained are significantly better than the state-of-theart
techniques and the approach is simpler than other deep learning approaches
for document image binarization.
Collections
- M Tech Dissertations [923]