Environmental Sound Classification (ESC) using Handcrafted and Learned Features

Agrawal, Dharmeshkumar Maheshchandra

View/Open

201511032 (2.749Mb)

Date

2017

Author

Agrawal, Dharmeshkumar Maheshchandra

Metadata

Show full item record

Abstract

"Environmental Sound Classification (ESC) is an important research field due to its application in various field such as hearing aids, road surveillance system for security and safety purpose, etc. ESC task was earlier done using Coefficients (MFCC) feature set and Gaussian Mixture Model (GMM) classifier. Recently, deep-learning based approaches are used for ESC task such as Convolutional Neural Network (CNN) based classification which built an end-to-end system for ESC on CNN framework. The ESC task is a quite challenging problem as of environmental sounds that contains the various categories of sounds are difficult to classify. In this thesis, we proposed two new and different feature sets for ESC task, namely, handcrafted feature set (i.e., signal processing-based approach), and data-driven feature set (i.e., machine learning-based approach). In handcrafted feature set, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for ESC task. In this thesis, we have used two classifiers, namely, GMM using cepstral features, and CNN using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with MFCC and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. The result shows score-level fusion of proposed TEO-based Gammatone feature-set and MFCC gave better performance than MFCC on both datasets by using GMM and CNN classifiers. This shows that proposed TEO-based Gammatone features contain complementary information, which is helpful in ESC task. In data-driven feature set, we use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in the mid-frequency range, while some of the low frequency subband filters resemble Gammatone basis. We have used our proposed model as a front-end for the ESC task with supervised CNN as a back-end."

URI

http://drsr.daiict.ac.in//handle/123456789/685

Collections

M Tech Dissertations [923]