Spectro-temporal features based automatic speech recognition

Nagpal, Ankit

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/554

Title:	Spectro-temporal features based automatic speech recognition
Authors:	Patil, Hemant A. Nagpal, Ankit
Keywords:	Automatic speech recognition Acoustics in engineering
Issue Date:	2015
Publisher:	Dhirubhai Ambani Institute of Information and Communication Technology
Citation:	Nagpal, Ankit (2015). Spectro-temporal features based automatic speech recognition. Dhirubhai Ambani Institute of Information and Communication Technology, xi, 53 p. (Acc.No: T00517)
Abstract:	ASR technology has found its application in almost every field in life. Today‟s world cannot be considered as noise-free and deploying ASR technology in such environments would incorporate the challenge to deal with various kinds of noises and channel effects. Thus, robustness of ASR is becoming increasingly important. State-of-the-art Mel Frequency Cepstral Coefficients (MFCC) features capture spectral information and some temporal dynamics in the speech signal. Spectro-temporal features, on the other hand, are more physiologically motivated, as they capture more perceptual information, and are able to perform better in the presence of noise. In this thesis, cepstral analysis, theory of cepstral coefficients (MFCC and Gammatone Frequency Cepstral Coefficients, i.e., GFCC) and motivation to use spectro-temporal features, are discussed. Furthermore, the work presents the theory behind Gabor filters and motivation to incorporate them for ASR task. Algorithm for the extraction of spectro-temporal features- Spectro-Temporal Gabor filterbank features (GBFB), is also presented in detail. Experiments are carried out on TIMIT database, with various additive noises such as white, babble, volvo and high frequency (under various SNR levels) to compare spectro-temporal features, denoted by GBFBmel+MFCC and the proposed GBFBGamm+GFCC (incorporating mel and Gammatone filters, respectively) and the state-of-the-art MFCC features. Experiments are carried out with HTK as back end, taking into account the effectiveness of acoustic and language model. It is concluded that with acoustic modeling only, spectro-temporal Gabor filterbank (GBFB) features (whether incorporating Gammatone filterbank or mel filterbank) when concatenated with cepstral coefficients perform better than the state-of-the-art MFCC features in clean conditions as well as in the presence of various additive noises or signal degradation conditions. This is because GBFB features are able to capture more local joint spectro-temporal information, than the MFCC features, from the speech signal.
URI:	http://drsr.daiict.ac.in/handle/123456789/554
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201311022.pdf Restricted Access		3.89 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets