Environmental Sound Classification (ESC) using Handcrafted and Learned Features

Agrawal, Dharmeshkumar Maheshchandra

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Agrawal, Dharmeshkumar Maheshchandra
dc.date.accessioned	2018-05-17T09:29:56Z
dc.date.available	2018-05-17T09:29:56Z
dc.date.issued	2017
dc.identifier.citation	Dharmeshkumar Maheshchandra Agrawal(2017).Environmental Sound Classification (ESC) using Handcrafted and Learned Features.Dhirubhai Ambani Institute of Information and Communication Technology.xi, 61 p.(Acc.No: T00649)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/685
dc.description.abstract	"Environmental Sound Classification (ESC) is an important research field due to its application in various field such as hearing aids, road surveillance system for security and safety purpose, etc. ESC task was earlier done using Coefficients (MFCC) feature set and Gaussian Mixture Model (GMM) classifier. Recently, deep-learning based approaches are used for ESC task such as Convolutional Neural Network (CNN) based classification which built an end-to-end system for ESC on CNN framework. The ESC task is a quite challenging problem as of environmental sounds that contains the various categories of sounds are difficult to classify. In this thesis, we proposed two new and different feature sets for ESC task, namely, handcrafted feature set (i.e., signal processing-based approach), and data-driven feature set (i.e., machine learning-based approach). In handcrafted feature set, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for ESC task. In this thesis, we have used two classifiers, namely, GMM using cepstral features, and CNN using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with MFCC and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. The result shows score-level fusion of proposed TEO-based Gammatone feature-set and MFCC gave better performance than MFCC on both datasets by using GMM and CNN classifiers. This shows that proposed TEO-based Gammatone features contain complementary information, which is helpful in ESC task. In data-driven feature set, we use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in the mid-frequency range, while some of the low frequency subband filters resemble Gammatone basis. We have used our proposed model as a front-end for the ESC task with supervised CNN as a back-end."
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Convolutional neural network
dc.subject	Teager energy operator
dc.subject	Gaussian mixture model
dc.classification.ddc	600.32 AGR
dc.title	Environmental Sound Classification (ESC) using Handcrafted and Learned Features
dc.type	Dissertation
dc.degree	M.Tech.
dc.student.id	201511032
dc.accession.number	T00649

Files in this item

Name:: 201511032.pdf
Size:: 2.749Mb
Format:: PDF
Description:: 201511032

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record