Auditory representation learning

Sailor, Hardik B.

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/785

Title:	Auditory representation learning
Authors:	Patil, Hemant A. Sailor, Hardik B.
Keywords:	Representation learning Deep learning Filterbank learning Speech Databases Sound classification Auditory model Speech signal Signal processing Audio processing Speech recognition Speech detection
Issue Date:	2018
Publisher:	Dhirubhai Ambani Institute of Information and Communication Technology
Citation:	Sailor, Hardik B. (2018). Auditory Representation Learning. Dhirubhai Ambani Institute of Information and Communication Technology, xxv, 218 p. (Acc. No: T00688)
Abstract:	Representation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be helpful to the pattern classifier. Specifically, the unsupervised RL has gained a significant interest in the feature learning in various signal processing areas including the speech and audio processing. Recently, various RL methods are used to learn the auditorylike representations from the speech signals or its spectral representations. In this thesis, we propose a novel auditory representation learning model based on the Convolutional Restricted Boltzmann Machine (ConvRBM). The auditorylike subband filters are learned when the model is trained directly on the raw speech and audio signals with arbitrary lengths. The learned auditory frequency scale is also nonlinear similar to the standard auditory frequency scales. However, the ConvRBM frequency scale is adapted to the sound statistics. The primary motivation for the development of our model is to apply in the Automatic Speech Recognition (ASR) task. Experiments on the standard ASR databases show that the ConvRBM filterbank performs better than the Mel filterbank. The stability analysis of the model is presented using Lipschitz continuity condition. The proposed model is improved by using annealing dropout and Adam optimization. Noise-robust representation is achieved by combining ConvRBM filterbank with an energy estimation using the Teager Energy Operator (TEO). As a part of the research work for the MeitY, Govt. of India sponsored consortium project, the ConvRBM is used as a front-end for the ASR system in the speech-based access for the agricultural commodities in the Gujarati language. Inspired by the success in the ASR task, we applied our model in three audio classification tasks, namely, Environmental Sound Classification (ESC), synthetic and replay Spoof Speech Detection (SSD) in the context of the Automatic Speaker Verification (ASV), and Infant Cry Classification (ICC).We further propose the two layer auditory model by stacking two ConvRBMs. We refer it as an Unsupervised Deep Auditory Model (UDAM) and it performed well compared to the single layer ConvRBM in the ASR task.
URI:	http://drsr.daiict.ac.in//handle/123456789/785
Appears in Collections:	PhD Theses

Files in This Item:

File	Description	Size	Format
201321002_Hardik B. Sailor.pdf	201321002	14.63 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets