M Tech Dissertations

Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 9 of 9
  • ItemOpen Access
    On the Robustness of Federated Learning towards Various Attacks
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Yagnik, Shrey Devenkumar; Singh, Priyanka; Joshi, Manjunath V.
    A study based on Federated Learning (FL), i.e., a kind of decentralized learningthat consists of local training among the clients, and the central server returnsthe federated average. Deep learning models have been used in numeroussecurity-critical settings since they have performed well on various tasks. Here,we study different kinds of attacks on FL. FL has become a popular distributedtraining method because it enables users to work with large datasets without sharingthem. Once the model has been trained using data on local devices, only theupdated model parameters are sent to the central server. The FL approach is distributed.Thus, someone could launch an attack to influence the model�s behavior.In this work, we conducted the study for a Backdoor attack, a black-box attackwhere we added a few poisonous instances to check the model�s behavior duringtest time. Also, we conducted three types of White-Box attacks, i.e., Fast GradientSign Method (FGSM), Carlini-Wagner (CW), and DeepFool. We conductedvarious experiments using the standard CIFAR10 dataset to alter the model�s behavior.We used ResNet20 and DenseNet as the Deep Neural Networks. Wefound some adversarial samples upon which the required perturbation is addedto fool the model upon giving the misclassifications. This decentralized approachto training can make it more difficult for attackers to access the training data, butit can also introduce new vulnerabilities that attackers can exploit. We found outthat the expected behavior of the model could be compromised without havingmuch difference in the training accuracy.
  • ItemOpen Access
    Quantile Regression and Deep Learning Models for Air Quality Analysis and Prediction in Delhi City
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Jha, Gaurav; Anand, Pritam
    Quantile regression models have gained popularity among researchers these days.The mean regression model estimates the mean of yi given x. But in some applications,estimation of the quantiles of yi given x is not very useful. This thesispresents a data-driven analysis and prediction of air quality in Delhi metro cityusing quantile regression and deep learning models.The main objectives are to investigate the monthly trend and correlation ofPM2.5, PM10, NO2 and SO2 concentration and temperature, to compare differentregression models such as linear, quadratic, kernel, and quantile regression toestimate the PM2.5, PM10, NO2 and SO2 concentration using the temperaturevariables, and to compare different deep learning models such as gated recurrentunits (GRUs), vanilla(LSTM), simple long short-term memory (LSTM) networks,convolutional neural network - long short-term memory (CNN-LSTM) networks,and support vector regression (SVR) for time series forecasting of pollution levels.The data used in this study is the Delhi air quality data from 2015 to 2020, whichcontains various pollutants and environmental factors.The results show that quantile regression is more flexible, robust, and informativethan other models, and can capture the variability and diversity of thePM2.5, PM10, NO2 and SO2 distribution over distinct quantiles or percentiles.The results also show that deep learning models are effective and powerful toolsfor time series forecasting on pollution data. Among them, the SVR model is superiorto other models. The study aims to contribute to the scientific knowledgeand practical solutions for air quality prediction and analysis.
  • ItemOpen Access
    Automated Handwritten Answer Sheet Evaluation System Using Deep Learning Methods
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Ratanghayara, Khushali; Roy, Anil K.; Anand, Pritam
    Automation has gained significant prominence in various technological domains,offering the potential to streamline processes and minimize human error. Eventhough it is believed that education and the process of teaching-learning requirehuman-to-human interaction, automation may have tremendous promises heretoo, particularly in tasks such as student registration, class attendance, administrativeduties, and answer sheets evaluation.This thesis focuses on automating the evaluation of answer sheets submittedby students as a result of a conducted examination. As the scope of the thesis, wehave considered answers to only two types of questions, namely, multiple choicequestions for which one of the four alphabets: a, b, c or d is selected as the correctanswer and objective type questions which have single word answers such as�Yes� or �No�; �correct� or �incorrect�; �true� or �false�. Therefore we need to �read�these 10 answers to mark it a right answer, else it is a wrong answer. The rightanswer will fetch a positive mark and the wrong answer will attract a zero or anegative mark, whatever the case may be.We took a two-pronged approach to achieve automation. At first, we tried the�classification� approach, in which our system was trained to classify the answersin one of these 10 classes, i.e., a, b, c, d, Yes, No, Correct, Incorrect, True, False.We employed an object detection model, YOLO which was capable of classifyinga fixed set of ten classes representing possible answers. This model achievedan impressive accuracy rate of 93% and demonstrated the potential to automatethe evaluation of multiple-choice and one-word answer-type questions. Our systemworked fine until it found an answer which was of a new class, for example,�right�. The system tries to read it as one of the 10 classes and that is wrong. It reducesthe efficiency of this automation system. To experiment with this approachwe created our own handwritten dataset of these 10 classes. It has over 24200data.In order to address the limitations of the first approach, we attempted thisproblem as a recognition problem. It leveraged text recognition models combiningconvolutional neural networks (CNN) and recurrent neural networks (RNNs).YOLO was used to recognize each of the 26 alphabets of English script. Then 5deep learning models, such as CNN, CNN + RNN, CNN + LSTM, CNN + BidirectionalLSTM, and CNN + Bidirectional GRU were used to read the word. Thesemodels were capable of recognizing all words written in the answer sheets. Wethen proceeded to match the recognized words with a fixed set of answers. However,in cases where a match was not found, we considered those responses forfurther evaluation. Through a comparison of the outputs from each model, weachieved an impressive accuracy of 91%. This outcome underscores the effectivenessof employing diverse methods to automate the evaluation of answer sheets.In order to achieve it, we used a benchmark IAM word dataset [17]. Thisdataset was used to train these deep-learning models for effective automation.Then we combine the self-generated dataset of handwritten samples in this approachto test the automatic answer sheet evaluation system.In future, the remaining two types of questions, e.g., short answer type andlong answer type, may also be included in this answer sheet evaluation automationsystem. This will require NLP based approach to evaluate the answers in acontextual approach.
  • ItemOpen Access
    Polarimetric SAR Image Classification using Gaussian Context Transformer in Complex-Valued Convolutional Neural Networks
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2023) Pandya, Utkarsh Samirbhai; Mandal, Srimanta
    There have been many advancements in the field of terrain classification usingPolarimetric SAR images/data. This Thesis explores different classical methodsas well as deep learning methods for this task. The Covariance matrix of landsample data is classified into different terrains such as various crops, urban areasor water, etc. The PolSAR covariance matrix has both amplitude and phasecomponents. Statistical techniques such as Wishart Classifier and Wishart MixtureModel with Conditional Random Field (WMM-CRF) approach exploit the inherentmathematical predispositions of the data while Deep Learning techniquessuch as Complex Valued-CNN and Squeeze and Excitation Networks utilize thebrilliance of neural networks to study correlation in spatial data as well as interchanneldependencies. There have been studies in order to retrofit deep learningmodels with components that can leverage the predetermined data patterns inany dataset. Gaussian Context Transformer is one such technique that allows theexploitation of inter-channel dependencies with predetermined mathematical inclinationswhile the rest of the model learns spatial-contextual parameters. Inorder to overcome noise, there are no available ground truth images, hence dataaugmentation is done with several image processing techniques such as Box-Carfilter, Lee-Sigma filter, and Mean-Shift filters can be used to downsize the effectsof the multiplicative noise as much as possible. The effects of Gaussian ContextTransformers and Data augmentation on one Indian land sample, namely, Mysoreand three European land samples, namely, Flevoland-7, Flevoland-15, and Landesshow promising results.
  • ItemOpen Access
    English Handwritten Word Recognition
    (2021) Shah, Vidit; Khare, Manish; Bhilare, Shruti
    Today, tons of data is being generated every day and this helps with the automation of several tasks. Automated recognition of handwritten words from images is one such challenging task. This can be done by extracting the important features out of an image. The major challenge for handwritten word recognition over optical word recognition is the inherent variation in the handwriting styles. To recognize such words there must be a model or a system. Thus, it is of utmost importance to build handwritten word recognition models with high accuracy. The model will face multiple challenges that need to be overcome to accurately predict the given word on its own. This model can be used in pharmaceuticals to convert the prescription or report images into scanned documents and store the relevant information from it. In this work, I will be building a deep-learningbased odel for the English Handwritten Dataset that can recognize the words from the images. Dataset used here is the IAM word dataset. This dataset is publicly available. CNN architecture helps to extract features from images. Features could be in the form of edges or blurred images. RNN helps to learn the model from the previous states and predict the output for the next state. This process is called sequential learning. Combining the strength of feature extraction from CNN and sequence learning from RNN i.e. C-RNN, I got 72.46% accuracy and 11.88% character error rate. Accuracy depends on the dataset used for training purposes.
  • ItemOpen Access
    Image Super Resolution Using Deep Neural Networks
    (2021) Singh, Harsh Vardhan; Kumar, Ahlad
    The recent outbreak of COVID-19 has motivated researchers to contribute in the area of medical imaging using artificial intelligence and deep learning. Superresolution (SR), in the past few years, has produced remarkable results using deep learning methods. The ability of deep learning methods to learn the non-linear mapping from low-resolution (LR) images to their corresponding high-resolution (HR) images leads to compelling results for SR in diverse areas of research. In this paper, we propose a deep learning based image super-resolution architecture in Tchebichef transform domain. This is achieved by integrating a transform layer into the proposed architecture through a customized Tchebichef convolutional layer (TCL). The role of TCL is to convert the LR image from the spatial domain to the orthogonal transform domain using Tchebichef basis functions. The inversion of the aforementioned transformation is achieved using another layer known as the Inverse Tchebichef convolutional Layer (ITCL), which converts back the LR images from the transform domain to the spatial domain. It has been observed that using the Tchebichef transform domain for the task of SR takes the advantage of high and low-frequency representation of images that makes the task of super-resolution simplified. We, further, introduce transfer learning approach to enhance the quality of Covid based medical images. It is shown that our architecture enhances the quality of X-ray and CT images of COVID-19, providing a better image quality that helps in clinical diagnosis. Experimental results from our architecture provides competitive results when compared with most of the deep learning methods employed using a fewer number of trainable parameters.
  • ItemOpen Access
    Video captioning
    (2020) Laheri, Vishal Bharatkumar; Mandal, Srimanta
    In recent years, models for video captioning task has been improved very much. Despite advancement, it is still impeded by hardware constraints. Video captioning models takes a sequence of images and caption as inputs, which makes it one of the most memory consuming and computation required task. In this project work, we exploit the importance of required frames from the video to get the desired performance. We also propose the use of a video summarizing model embedded with the captioning model for dynamically selecting frames, which allows the reduction of required frames without losing Spatio-temporal information of the video.
  • ItemOpen Access
    Sentence detection
    (2020) Shah, Pushya; Mitra, Suman K.
    Sentence detection is a very important task for any natural language processing (NLP) application. Accuracy and performance of all other downstream natural language processing (NLP) task like Sentiment, Text Classification, named entity recognition (NER), Relation, etc depends on the accuracy of correctly detected sentence boundary. Clinical domain is very different compare to general domain of languages. Clinical sentence structure and vocabulary are different from general English. That’s why available sentence boundary detector tools are not performing well on clinical domain and we required a specific sentence detection model for clinical documents. ezDI Solutions (India) LLP have developed such system that can detect the sentence boundary. We examined Bidirectional Encoder Representations from Transformers (BERT) and Bidirectional Long Short-Term Memory (BiLSTM) algorithm and used BiLSTM-BERT hybrid model for sentence boundary detection on medical corpora.
  • ItemOpen Access
    Image aesthetic assessment using deep learning
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Sherashiya, Shyam; Mitra, Suman K.
    Image aesthetic assessment is defined as to classify the image in aesthetically good images and aesthetically bad images. In era of digital media, video and images has impact more in human life. Image aesthetic assessment is also important part of digital media. In earlier research in this area was based on the use of photographic rules, generic image descriptors, or hand-crafted features. These photographic rule-based approaches have their limitations, such as it has approximation in applying rules in the implementation. It was observed that photographic rules such as color distribution, brightness, hue count, low contrast are not enough to judge the process of image aesthetics. Those hand-crafted features may be suited for some specific task but may not fully cover the feature space that represents the primary characteristic for image aesthetic task. In recent researches, deep learning-based approaches have achieved great success in image aesthetic assessment problem. In this thesis, we have implemented various multichannel CNN architectures to classify images in high aesthetic and low aesthetic images. We have also used some pre-processing techniques to row data like various crops, padding, and class activation maps (CAM) techniques. Along with that, we have also implemented various pre-trained deep learning models such as VGG19, InceptionV3, and Resnet50 on multi-channel CNN networks, and analyze their impact with multi-channel CNN networks. The experiments are implemented on the AVA dataset, which shows improvements in the image aesthetic assessment task over existing approaches.