M Tech Dissertations
Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3
Browse
10 results
Search Results
Item Open Access Deep Learning for Severity Level-based Classification of Dysarthria(2021) Gupta, Siddhant; Patil, Hemant A.Dysarthria is a motor speech disorder in which muscles required to speak somehow gets damaged or paralyzed resulting in an adverse effect to the articulatory elements in the speech and rendering the output voice unintelligible. Dysarthria is considered to be one of the most common form of speech disorders. Dysarthria occurs as a result of several neurological and neuro-degenerative diseases, such as Parkinson’s Disease, Cerebral palsy, etc. People suffering from dysarthria face difficulties in conveying vocal messages and emotions, which in many cases transform into depression and social isolation amongst the individuals. Dysarthria has become a major speech technology issue as the systems that work efficiently for normal speech, such as Automatic Speech Recognition systems, do not provide satisfactory results for corresponding dysarthric speech. In addition, people suffering from dysarthria are generally limited by their motor functions. Therefore, development of voice assisted systems for them become all the more crucial. Furthermore, analysis and classification of dysarthric speech can be useful in tracking the progression of disease and its treatment in a patient. In this thesis, dysarthria has been studied as a speech technology problem to classify dysarthric speech into four severity-levels. Since, people with dysarthria face problem during long speech utterances, short duration speech segments (maximum 1s) have been used for the task, to explore the practical applicability of the thesis work. In addition, analysis of dysarthric speech has been done using different methods such as time-domain waveforms, Linear prediction profile, Teager Energy Operator profile, Short-Time Fourier Transform etc., to distinguish the best representative feature for the classification task. With the rise in Artificial Intelligence, deep learning techniques have been gaining significant popularity in the machine classification and pattern recognition tasks. Therefore, to keep the thesis work relevant, several machine learning and deep learning techniques, such as Gaussian Mixture Models (GMM), Convolutional Neural Network (CCN), Light Convolutional Neural Network (LCNN), and Residual Neural Network (ResNet) have been adopted. The severity levelbased classification task has been evaluated on various popular measures such as, classification accuracy and F1-scores. In addition, for comparison with the short duration speech, classification has also been done on long duration speech (more than 1 sec) data. Furthermore, to enhance the relevance of the work, experiments have been performed on statically meaningful and widely used Universal Access-Speech Corpus.Item Open Access Deep learning techniques for speech pathology applications(2020) Purohit, Mirali Virendrabhai; Patil, Hemant A.Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.Item Open Access Apparel attributes classification using deep learning(2020) Desai, Harsh Sanjaykumar; Jat, P.MApparel attributes classification finds a practical applications in E-Commerce. The project is for www.Blibli.com website which is an E-commerce Platform in Indonesia and a partner of Coviam Technologies. This report describes an approach to classify attributes such as material, neck/collar, sleeves type etc. specific to various apparels using Natural Language Processing and Deep Learning techniques. The classified products based on attributes will be used as filters on search results page to enhance and improve search mechanism of website. We have classified 95% apparel products based on material attribute and achieved 87% test accuracy on neck/collar attribute classification. The report is divided into four main parts which covers: Introduction, DataSet Preparation, Methodology and the Experimentation. Lastly, other similar work performed during internship along with the future work is discussed.Item Open Access Clickbait detection using deep learning Techniques(2020) Parikh, Apurva Ketanbhai; Majumder, PrasenjitWith the growing shift towards news consumption primarily through social media sites like Twitter, Facebook etc., most of the news agencies are prompting their stories on social media platform. These news agencies are publishing fake news on social media to generate revenue by enticing users to click on their articles. To increase the number of readers agencies use eye-catchy headlines accompanied with article link, which attract the reader to read the article. These attractive headlines are called Clickbaits. Usually, clickbait article does not meet the expectation of the user. In this work we try to develop an end-to-end clickbait detection system using Transformer based model Bidirectional Encoder Representations from Transformers (BERT). We also found few clickbait specific features which we hypothesised can be utilised along with BERT model to develop a better classifier. Our proposed approach using BERT significantly outperformed baseline paper which utilised BiLSTM.Item Open Access Applications of deep-learning at digital communication receiver(Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Nanavati, Tilak Digantkumar; Vasavada, YashModulation and demodulation are fundamental modules for communication systems. The modulation techniques — Offset QPSK (OQPSK), p/2 BPSK, p/4 QPSK and GMSK — are frequently applied in the power-constrained wireless communication links (e.g., the terminal transmission links of several 2G, 3G and 4G terrestrial and satellite air-interface standards). However, their detailed numerical comparison of the performance and functional characteristics are currently lacking in the literature. The prior studies have focused on a comparison of at the most two of these four schemes (typically OQPSK versus GMSK). One of the objectives of this thesis is to bridge this gap. We provide a detailed comparison of (i) the spectral regrowth and (ii) probability of bit error Perrb versus Eb/N0 performance of these four modulation schemes in the presence ofAM/AMandAM/PM non-linearities with varying backoff (BO). We believe that our results with key observations will be beneficial in selecting an appropriate modulation technique when designing practical communication systems. Another crucial component of communication and signal processing systems is the estimation of channel parameters. In the practical communication systems, the varying channel conditions and non-linear channel impairments make the task of estimation more challenging. We propose a Deep Learning (DL) application at digital communication receiver to estimate the channel impairments that are difficult to describe with a rigid mathematical tractable model. Another objective of our research work is to develop a learned parameter estimator that effectively captures the non-linear functional mappings and produces accurate estimations. The results for Phase Offset (PO) impairment estimations obtained with our proposed approach give competitive accuracy concerning its baseline equivalent. Lastly, we demonstrate the learning-based modulation classifier that potentially solves the misclassification problem presented in an earlier study.Item Open Access Augmenting dialogue generation using dialogue act embeddings: a transfer learning approach(Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Bisht, Abhimanyu Singh; Majumder, PrasenjitThe following work looks at contemporary end-to-end dialogue systems with the aim of improving dialogue generation in an open-domain setting. It provides an overview of popular literature in the domain of dialogue generation, followed by a brief look at how human dialogue is understood from the perspective of Linguistics and Cognitive Science. We try to extract useful ideas from these domains of research and implement them in a transfer learning approach where a pretrained language model is supplemented with dialogue act information using special embeddings. The hypothesis behind the proposed approach is that the dialogue act information will aid the generation process. The proposed approach is then compared with a baseline approach on their performance on the DailyDialog[12] dataset using perplexity as the evaluation metric. Though the proposed approach is a significant improvement over the baseline, the contribution of the Dialogue Act Embeddings in the development is shown to be marginal via ablation analysis.Item Open Access Imbalanced bioassay data classification for drug discovery(Dhirubhai Ambani Institute of Information and Communication Technology, 2018) Shah, Jeni Snehal; Joshi, Manjunath V.All the methods developed for pattern recognition will show inferior performance if the dataset presented to it is imbalanced, i.e. if the samples belonging to one class are much more in number compared to the samples from the other class/es. Due to this, imbalanced dataset classification has been an active area of research in machine learning. In this thesis, a novel approach to classifying imbalanced bioassay data is presented. Bioassay data classification is an important task in drug discovery. Bioassay data consists of feature descriptors of various compounds and the corresponding label which denotes its potency as a drug: active or inactive. This data is highly imbalanced, with the percentage of active compounds ranging from 0.1% to 1.4%, leading to inaccuracies in classification for the minority class. An approach for classification in which separate models are trained by using different features derived by training stacked autoencoders (SAE) is proposed. After learning the features using SAEs, feed-forward neural networks (FNN) are used for classification, which are trained to minimize a class sensitive cost function. Before learning the features, data cleaning is performed using Synthetic Minority Oversampling Technique (SMOTE) and removing Tomek links. Different levels of features can be obtained using SAE. While some active samples may not be correctly classified by a trained network on a certain feature space, it is assumed that it can be classified correctly in another feature space. This is the underlying assumption behind learning hierarchical feature vectors and learning separate classifiers for each feature space. viItem Open Access Personalized News-Feeds Recommendation System(Dhirubhai Ambani Institute of Information and Communication Technology, 2017) Paliwal, Ankit; Dasgupta, Sourish"The idea of personalization of recommendations is a very important factor - both for users as well as organizations. Users want their experience on a website to be as comfortable as possible, and the organizations want to lure more and more users on their platform. Whether it is shopping on-line or gathering information from across the world, Recommendation engines are changing the way people communicate with these on-line systems, and helping them to make their experiences better. Personalizing News-feeds recommendations is one such system, that helps users on a platform to stay updated with news from all over the globe. Every user has his/her own preferences and interests, which he/she seems to prefer over others. Our aim here, was to design one such system that is able to show the users what they are interested in and not bother them with unwanted material. Their have been many past researches on this topic, each has handled the problem in their own unique way. The major idea behind the systems, however, remains more or less the same, and that is to capture user’s interests. Once the system is able to do that precisely, the recommendation part becomes easy. Different authors use different tools and technologies to do this. Some use Topic Modelling, some others use Deep Learning and some people who use different variations of Hybrid recommendation systems. In this work, I have used Topic Modelling and the idea of penalizing these topics, based on what user prefers to see and what he does not. Our whole program runs as a layer above the model generated using Latent Dirichlet Allocation."Item Open Access Human Action Recognition Using Deep Neural Networks(Dhirubhai Ambani Institute of Information and Communication Technology, 2017) Thakkar, Shaival; Joshi, Manjunath V."In this thesis, we present a hierarchical approach for human action classification using 3-D Convolutional neural networks (3-D CNN). Human actions refer to positioning and movement of hands and legs and hence can be classified based on those performed by hands or by legs or, in some cases, both. This acts as the intuition for our work on hierarchical classification. In this work, we consider the actions as tasks performed by hand or leg movements. Therefore, instead of using a single 3-D CNN for classification of given actions, we use multiple networks to perform the classification hierarchically, that is, we first classify an action into a hand or leg action and then use two separate networks for hand and leg action classes to perform classification among target action categories. In particular, we train three networks to classify six different actions, comprising of three actions each for hands and legs. The use of 3-D CNN enables automatic extraction of features in spatial as well as temporal domain, avoiding the need for hand crafted features. This makes it one of the better approaches when it comes to video classification. We use the KTH dataset to evaluate our approach and comparison with the state of the art methods shows that our approach outperforms most of the state of the art methods."Item Open Access Object Recognition using Self Learned Features(Dhirubhai Ambani Institute of Information and Communication Technology, 2016) Parikh, Ketul D.; Joshi, Manjunath V.A great deal of research has been centered around developing algorithms forlearning features from unlabelled information. Much advance has been made onbenchmark datasets by utilizing progressively complex unsupervised learning algorithmsand deep models. However, the time required to train such deep networksis a major drawback. This thesis presents a generalized trainable frameworkfor object detection in static images. In this work, we have used a ConvolutionalNeural Network (CNN) for training and obtained good classificationresults in terms of accuracy. The main idea is to learn features from the data itself(in unsupervised way) and then apply a classifier (in supervised way) to classify.We have used CNN to extract useful hierarchical features using natural images[39] as training images. The learned convolutional kernels (weights) are appliedonto MNIST and CIFAR-10 datasets to extract their features. We then use CNNnetwork for classification. Despite the simplicity of our network, we achieve accuracyas good as previously published results on MNIST and CIFAR-10 datasets.Keywords: Object recognition, deep learning, Deep Neural Network (DNN), ConvolutionalNeural Network (CNN).