Please use this identifier to cite or link to this item:
http://drsr.daiict.ac.in//handle/123456789/685
Title: | Environmental Sound Classification (ESC) using Handcrafted and Learned Features |
Authors: | Patil, Hemant A. Agrawal, Dharmeshkumar Maheshchandra |
Keywords: | Convolutional neural network Teager energy operator Gaussian mixture model |
Issue Date: | 2017 |
Publisher: | Dhirubhai Ambani Institute of Information and Communication Technology |
Citation: | Dharmeshkumar Maheshchandra Agrawal(2017).Environmental Sound Classification (ESC) using Handcrafted and Learned Features.Dhirubhai Ambani Institute of Information and Communication Technology.xi, 61 p.(Acc.No: T00649) |
Abstract: | "Environmental Sound Classification (ESC) is an important research field due to its application in various field such as hearing aids, road surveillance system for security and safety purpose, etc. ESC task was earlier done using Coefficients (MFCC) feature set and Gaussian Mixture Model (GMM) classifier. Recently, deep-learning based approaches are used for ESC task such as Convolutional Neural Network (CNN) based classification which built an end-to-end system for ESC on CNN framework. The ESC task is a quite challenging problem as of environmental sounds that contains the various categories of sounds are difficult to classify. In this thesis, we proposed two new and different feature sets for ESC task, namely, handcrafted feature set (i.e., signal processing-based approach), and data-driven feature set (i.e., machine learning-based approach). In handcrafted feature set, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for ESC task. In this thesis, we have used two classifiers, namely, GMM using cepstral features, and CNN using spectral features. We performed experiments on two datasets, namely, ESC-50, and UrbanSound8K. We compared TEO-based coefficients with MFCC and Gammatone cepstral coefficients (GTCC), in which GTCC used mean square energy. The result shows score-level fusion of proposed TEO-based Gammatone feature-set and MFCC gave better performance than MFCC on both datasets by using GMM and CNN classifiers. This shows that proposed TEO-based Gammatone features contain complementary information, which is helpful in ESC task. In data-driven feature set, we use Convolutional Restricted Boltzmann Machine (ConvRBM) to learn filterbank from the raw audio signals. ConvRBM is a generative model trained in an unsupervised way to model the audio signals of arbitrary lengths. ConvRBM is trained using annealed dropout technique and parameters are optimized using Adam optimization. The subband filters of ConvRBM learned from the ESC-50 database resemble Fourier basis in the mid-frequency range, while some of the low frequency subband filters resemble Gammatone basis. We have used our proposed model as a front-end for the ESC task with supervised CNN as a back-end." |
URI: | http://drsr.daiict.ac.in//handle/123456789/685 |
Appears in Collections: | M Tech Dissertations |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
201511032.pdf Restricted Access | 201511032 | 2.82 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.