Environmental Sound Classification (ESC) using Handcrafted and Learned Features

Agrawal, Dharmeshkumar Maheshchandra

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/685

Title:	Environmental Sound Classification (ESC) using Handcrafted and Learned Features
Authors:	Patil, Hemant A. Agrawal, Dharmeshkumar Maheshchandra
Keywords:	Convolutional neural network Teager energy operator Gaussian mixture model
Issue Date:	2017
Publisher:	Dhirubhai Ambani Institute of Information and Communication Technology
Citation:	Dharmeshkumar Maheshchandra Agrawal(2017).Environmental Sound Classification (ESC) using Handcrafted and Learned Features.Dhirubhai Ambani Institute of Information and Communication Technology.xi, 61 p.(Acc.No: T00649)
Abstract:	"Environmental Sound Classification (ESC) is an important research field due to its application in various field such as hearing aids, road surveillance system for security and safety purpose, etc. ESC task was earlier done using Coefficients (MFCC) feature set and Gaussian Mixture Model (GMM) classifier. Recently, deep-learning based approaches are used for ESC task such as Convolutional Neural Network (CNN) based classification which built an end-to-end system for ESC on CNN framework. The ESC task is a quite challenging problem as of environmental sounds that contains the various categories of sounds are difficult to classify. In this thesis, we proposed two new and different feature sets for ESC task, namely, handcrafted feature set (i.e., signal processing-based approach), and data-driven feature set (i.e., machine learning-based approach). In handcrafted feature set, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for ESC task. In this thesis, we have used two classifiers, namely, GMM using cepstral features, and CNN using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with MFCC and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. The result shows score-level fusion of proposed TEO-based Gammatone feature-set and MFCC gave better performance than MFCC on both datasets by using GMM and CNN classifiers. This shows that proposed TEO-based Gammatone features contain complementary information, which is helpful in ESC task. In data-driven feature set, we use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in the mid-frequency range, while some of the low frequency subband filters resemble Gammatone basis. We have used our proposed model as a front-end for the ESC task with supervised CNN as a back-end."
URI:	http://drsr.daiict.ac.in//handle/123456789/685
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201511032.pdf Restricted Access	201511032	2.82 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets