Deep learning techniques for speech pathology applications

Purohit, Mirali Virendrabhai

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/974

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Purohit, Mirali Virendrabhai
dc.date.accessioned	2020-09-22T14:19:41Z
dc.date.available	2023-02-17T14:19:41Z
dc.date.issued	2020
dc.identifier.citation	Purohit, Mirali Virendrabhai (2020). Deep learning techniques for speech pathology applications. Dhirubhai Ambani Institute of Information and Communication Technology. xiv, 113 p. (Acc.No: T00892)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/974
dc.description.abstract	Human-machine interaction has gained more attention due to its interesting applications in industries and day-to-day life. In recent years, speech technologies have grown rapidly because of the advancement in fields of machine learning and deep learning. Various deep learning architectures have shown state-of-theart results in different areas, such as computer vision, medical domain, etc. We achieved massive success in developing speech-based systems, i.e., Intelligent Personal Assistants (IPAs), chatbots, Text-To-Speech (TTS), etc. However, there are certain limitations to these systems. Speech processing systems efficiently work only on normal-mode speech and hence, show poor performance on the other kinds of speech such as impaired speech, far-field speech, shouted speech, etc. This thesis work is contributed to the improvement of impaired speech. To address this problem, this work has two major approaches: 1) classification, and 2) conversion technique. The new paradigm, namely, weak speech supervision is explored to overcome the data scarcity problem and proposed for the classification task. In addition, the effectiveness of the residual network-based classifier is shown over the traditional convolutional neural network-based model for the multi-class classification of pathological speech. With this, using Voice Conversion (VC)-based techniques, variants of generative adversarial networks are proposed to repair the impaired speech to improve the performance of Voice Assistant (VAs). Performance of these various architectures is shown via objective and subjective evaluations. Inspired by the work done using the VC-based technique, this thesis is also contributed in the voice conversion field. To that effect, a state-of-the-art system, namely, adaptive generative adversarial network is proposed and analyzed via comparing it with the recent state-of-the-art method for voice conversion.
dc.subject	Machine learning
dc.subject	Deep learning
dc.subject	Weak supervision
dc.subject	Generative adversarial network
dc.subject	Dysarthria
dc.subject	Whisper
dc.subject	Voice conversion
dc.classification.ddc	006.454 PUR
dc.title	Deep learning techniques for speech pathology applications
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201811067
dc.accession.number	T00892
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201811067.pdf Restricted Access		5.41 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets