Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/785
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorPatil, Hemant A.
dc.contributor.authorSailor, Hardik B.
dc.date.accessioned2019-03-19T10:52:15Z
dc.date.available2019-03-19T10:52:15Z
dc.date.issued2018
dc.identifier.citationSailor, Hardik B. (2018). Auditory Representation Learning. Dhirubhai Ambani Institute of Information and Communication Technology, xxv, 218 p. (Acc. No: T00688)
dc.identifier.urihttp://drsr.daiict.ac.in//handle/123456789/785
dc.description.abstractRepresentation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be helpful to the pattern classifier. Specifically, the unsupervised RL has gained a significant interest in the feature learning in various signal processing areas including the speech and audio processing. Recently, various RL methods are used to learn the auditorylike representations from the speech signals or its spectral representations. In this thesis, we propose a novel auditory representation learning model based on the Convolutional Restricted Boltzmann Machine (ConvRBM). The auditorylike subband filters are learned when the model is trained directly on the raw speech and audio signals with arbitrary lengths. The learned auditory frequency scale is also nonlinear similar to the standard auditory frequency scales. However, the ConvRBM frequency scale is adapted to the sound statistics. The primary motivation for the development of our model is to apply in the Automatic Speech Recognition (ASR) task. Experiments on the standard ASR databases show that the ConvRBM filterbank performs better than the Mel filterbank. The stability analysis of the model is presented using Lipschitz continuity condition. The proposed model is improved by using annealing dropout and Adam optimization. Noise-robust representation is achieved by combining ConvRBM filterbank with an energy estimation using the Teager Energy Operator (TEO). As a part of the research work for the MeitY, Govt. of India sponsored consortium project, the ConvRBM is used as a front-end for the ASR system in the speech-based access for the agricultural commodities in the Gujarati language. Inspired by the success in the ASR task, we applied our model in three audio classification tasks, namely, Environmental Sound Classification (ESC), synthetic and replay Spoof Speech Detection (SSD) in the context of the Automatic Speaker Verification (ASV), and Infant Cry Classification (ICC).We further propose the two layer auditory model by stacking two ConvRBMs. We refer it as an Unsupervised Deep Auditory Model (UDAM) and it performed well compared to the single layer ConvRBM in the ASR task.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectRepresentation learning
dc.subjectDeep learning
dc.subjectFilterbank learning
dc.subjectSpeech Databases
dc.subjectSound classification
dc.subjectAuditory model
dc.subjectSpeech signal
dc.subjectSignal processing
dc.subjectAudio processing
dc.subjectSpeech recognition
dc.subjectSpeech detection
dc.classification.ddc006.454 SAI
dc.titleAuditory representation learning
dc.typeThesis
dc.degreePh.D
dc.student.id201321002
dc.accession.numberT00688
Appears in Collections:PhD Theses

Files in This Item:
File Description SizeFormat 
201321002_Hardik B. Sailor.pdf20132100214.63 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.