Spectro-temporal features based automatic speech recognition

Nagpal, Ankit

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Nagpal, Ankit
dc.date.accessioned	2017-06-10T14:43:04Z
dc.date.available	2017-06-10T14:43:04Z
dc.date.issued	2015
dc.identifier.citation	Nagpal, Ankit (2015). Spectro-temporal features based automatic speech recognition. Dhirubhai Ambani Institute of Information and Communication Technology, xi, 53 p. (Acc.No: T00517)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/554
dc.description.abstract	ASR technology has found its application in almost every field in life. Today‟s world cannot be considered as noise-free and deploying ASR technology in such environments would incorporate the challenge to deal with various kinds of noises and channel effects. Thus, robustness of ASR is becoming increasingly important. State-of-the-art Mel Frequency Cepstral Coefficients (MFCC) features capture spectral information and some temporal dynamics in the speech signal. Spectro-temporal features, on the other hand, are more physiologically motivated, as they capture more perceptual information, and are able to perform better in the presence of noise. In this thesis, cepstral analysis, theory of cepstral coefficients (MFCC and Gammatone Frequency Cepstral Coefficients, i.e., GFCC) and motivation to use spectro-temporal features, are discussed. Furthermore, the work presents the theory behind Gabor filters and motivation to incorporate them for ASR task. Algorithm for the extraction of spectro-temporal features- Spectro-Temporal Gabor filterbank features (GBFB), is also presented in detail. Experiments are carried out on TIMIT database, with various additive noises such as white, babble, volvo and high frequency (under various SNR levels) to compare spectro-temporal features, denoted by GBFBmel+MFCC and the proposed GBFBGamm+GFCC (incorporating mel and Gammatone filters, respectively) and the state-of-the-art MFCC features. Experiments are carried out with HTK as back end, taking into account the effectiveness of acoustic and language model. It is concluded that with acoustic modeling only, spectro-temporal Gabor filterbank (GBFB) features (whether incorporating Gammatone filterbank or mel filterbank) when concatenated with cepstral coefficients perform better than the state-of-the-art MFCC features in clean conditions as well as in the presence of various additive noises or signal degradation conditions. This is because GBFB features are able to capture more local joint spectro-temporal information, than the MFCC features, from the speech signal.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Automatic speech recognition
dc.subject	Acoustics in engineering
dc.classification.ddc	006.54 NAG
dc.title	Spectro-temporal features based automatic speech recognition
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201311022
dc.accession.number	T00517

Files in this item

Name:: 201311022.pdf
Size:: 3.803Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record