M Tech Dissertations

Permanent URI for this collectionhttp://drsr.daiict.ac.in/handle/123456789/3

Browse

Search Results

Now showing 1 - 6 of 6
  • ItemOpen Access
    Deep Learning for Severity Level-based Classification of Dysarthria
    (2021) Gupta, Siddhant; Patil, Hemant A.
    Dysarthria is a motor speech disorder in which muscles required to speak somehow gets damaged or paralyzed resulting in an adverse effect to the articulatory elements in the speech and rendering the output voice unintelligible. Dysarthria is considered to be one of the most common form of speech disorders. Dysarthria occurs as a result of several neurological and neuro-degenerative diseases, such as Parkinson’s Disease, Cerebral palsy, etc. People suffering from dysarthria face difficulties in conveying vocal messages and emotions, which in many cases transform into depression and social isolation amongst the individuals. Dysarthria has become a major speech technology issue as the systems that work efficiently for normal speech, such as Automatic Speech Recognition systems, do not provide satisfactory results for corresponding dysarthric speech. In addition, people suffering from dysarthria are generally limited by their motor functions. Therefore, development of voice assisted systems for them become all the more crucial. Furthermore, analysis and classification of dysarthric speech can be useful in tracking the progression of disease and its treatment in a patient. In this thesis, dysarthria has been studied as a speech technology problem to classify dysarthric speech into four severity-levels. Since, people with dysarthria face problem during long speech utterances, short duration speech segments (maximum 1s) have been used for the task, to explore the practical applicability of the thesis work. In addition, analysis of dysarthric speech has been done using different methods such as time-domain waveforms, Linear prediction profile, Teager Energy Operator profile, Short-Time Fourier Transform etc., to distinguish the best representative feature for the classification task. With the rise in Artificial Intelligence, deep learning techniques have been gaining significant popularity in the machine classification and pattern recognition tasks. Therefore, to keep the thesis work relevant, several machine learning and deep learning techniques, such as Gaussian Mixture Models (GMM), Convolutional Neural Network (CCN), Light Convolutional Neural Network (LCNN), and Residual Neural Network (ResNet) have been adopted. The severity levelbased classification task has been evaluated on various popular measures such as, classification accuracy and F1-scores. In addition, for comparison with the short duration speech, classification has also been done on long duration speech (more than 1 sec) data. Furthermore, to enhance the relevance of the work, experiments have been performed on statically meaningful and widely used Universal Access-Speech Corpus.
  • ItemOpen Access
    Deep learning techniques for speech pathology applications
    (2020) Purohit, Mirali Virendrabhai; Patil, Hemant A.
    Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.
  • ItemOpen Access
    Apparel attributes classification using deep learning
    (2020) Desai, Harsh Sanjaykumar; Jat, P.M
    Apparel attributes classification finds a practical applications in E-Commerce. The project is for www.Blibli.com website which is an E-commerce Platform in Indonesia and a partner of Coviam Technologies. This report describes an approach to classify attributes such as material, neck/collar, sleeves type etc. specific to various apparels using Natural Language Processing and Deep Learning techniques. The classified products based on attributes will be used as filters on search results page to enhance and improve search mechanism of website. We have classified 95% apparel products based on material attribute and achieved 87% test accuracy on neck/collar attribute classification. The report is divided into four main parts which covers: Introduction, DataSet Preparation, Methodology and the Experimentation. Lastly, other similar work performed during internship along with the future work is discussed.
  • ItemOpen Access
    Clickbait detection using deep learning Techniques
    (2020) Parikh, Apurva Ketanbhai; Majumder, Prasenjit
    With the growing shift towards news consumption primarily through social media sites like Twitter, Facebook etc., most of the news agencies are prompting their stories on social media platform. These news agencies are publishing fake news on social media to generate revenue by enticing users to click on their articles. To increase the number of readers agencies use eye-catchy headlines accompanied with article link, which attract the reader to read the article. These attractive headlines are called Clickbaits. Usually, clickbait article does not meet the expectation of the user. In this work we try to develop an end-to-end clickbait detection system using Transformer based model Bidirectional Encoder Representations from Transformers (BERT). We also found few clickbait specific features which we hypothesised can be utilised along with BERT model to develop a better classifier. Our proposed approach using BERT significantly outperformed baseline paper which utilised BiLSTM.
  • ItemOpen Access
    Applications of deep-learning at digital communication receiver
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Nanavati, Tilak Digantkumar; Vasavada, Yash
    Modulation and demodulation are fundamental modules for communication systems. The modulation techniques — Offset QPSK (OQPSK), p/2 BPSK, p/4 QPSK and GMSK — are frequently applied in the power-constrained wireless communication links (e.g., the terminal transmission links of several 2G, 3G and 4G terrestrial and satellite air-interface standards). However, their detailed numerical comparison of the performance and functional characteristics are currently lacking in the literature. The prior studies have focused on a comparison of at the most two of these four schemes (typically OQPSK versus GMSK). One of the objectives of this thesis is to bridge this gap. We provide a detailed comparison of (i) the spectral regrowth and (ii) probability of bit error Perrb versus Eb/N0 performance of these four modulation schemes in the presence ofAM/AMandAM/PM non-linearities with varying backoff (BO). We believe that our results with key observations will be beneficial in selecting an appropriate modulation technique when designing practical communication systems. Another crucial component of communication and signal processing systems is the estimation of channel parameters. In the practical communication systems, the varying channel conditions and non-linear channel impairments make the task of estimation more challenging. We propose a Deep Learning (DL) application at digital communication receiver to estimate the channel impairments that are difficult to describe with a rigid mathematical tractable model. Another objective of our research work is to develop a learned parameter estimator that effectively captures the non-linear functional mappings and produces accurate estimations. The results for Phase Offset (PO) impairment estimations obtained with our proposed approach give competitive accuracy concerning its baseline equivalent. Lastly, we demonstrate the learning-based modulation classifier that potentially solves the misclassification problem presented in an earlier study.
  • ItemOpen Access
    Augmenting dialogue generation using dialogue act embeddings: a transfer learning approach
    (Dhirubhai Ambani Institute of Information and Communication Technology, 2020) Bisht, Abhimanyu Singh; Majumder, Prasenjit
    The following work looks at contemporary end-to-end dialogue systems with the aim of improving dialogue generation in an open-domain setting. It provides an overview of popular literature in the domain of dialogue generation, followed by a brief look at how human dialogue is understood from the perspective of Linguistics and Cognitive Science. We try to extract useful ideas from these domains of research and implement them in a transfer learning approach where a pretrained language model is supplemented with dialogue act information using special embeddings. The hypothesis behind the proposed approach is that the dialogue act information will aid the generation process. The proposed approach is then compared with a baseline approach on their performance on the DailyDialog[12] dataset using perplexity as the evaluation metric. Though the proposed approach is a significant improvement over the baseline, the contribution of the Dialogue Act Embeddings in the development is shown to be marginal via ablation analysis.