Please use this identifier to cite or link to this item:
http://drsr.daiict.ac.in//handle/123456789/974
Title: | Deep learning techniques for speech pathology applications |
Authors: | Patil, Hemant A. Purohit, Mirali Virendrabhai |
Keywords: | Machine learning Deep learning Weak supervision Generative adversarial network Dysarthria Whisper Voice conversion |
Issue Date: | 2020 |
Citation: | Purohit, Mirali Virendrabhai (2020). Deep learning techniques for speech pathology applications. Dhirubhai Ambani Institute of Information and Communication Technology. xiv, 113 p. (Acc.No: T00892) |
Abstract: | Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion. |
URI: | http://drsr.daiict.ac.in//handle/123456789/974 |
Appears in Collections: | M Tech Dissertations |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
201811067.pdf Restricted Access | 5.41 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.