Show simple item record

dc.contributor.advisorPatil, Hemant A.
dc.contributor.authorNagpal, Ankit
dc.date.accessioned2017-06-10T14:43:04Z
dc.date.available2017-06-10T14:43:04Z
dc.date.issued2015
dc.identifier.citationNagpal, Ankit (2015). Spectro-temporal features based automatic speech recognition. Dhirubhai Ambani Institute of Information and Communication Technology, xi, 53 p. (Acc.No: T00517)
dc.identifier.urihttp://drsr.daiict.ac.in/handle/123456789/554
dc.description.abstractASR technology has found its application in almost every field in life. Today‟s world cannot be considered as noise-free and deploying ASR technology in such environments would incorporate the challenge to deal with various kinds of noises and channel effects. Thus, robustness of ASR is becoming increasingly important. State-of-the-art Mel Frequency Cepstral Coefficients (MFCC) features capture spectral information and some temporal dynamics in the speech signal. Spectro-temporal features, on the other hand, are more physiologically motivated, as they capture more perceptual information, and are able to perform better in the presence of noise. In this thesis, cepstral analysis, theory of cepstral coefficients (MFCC and Gammatone Frequency Cepstral Coefficients, i.e., GFCC) and motivation to use spectro-temporal features, are discussed. Furthermore, the work presents the theory behind Gabor filters and motivation to incorporate them for ASR task. Algorithm for the extraction of spectro-temporal features- Spectro-Temporal Gabor filterbank features (GBFB), is also presented in detail. Experiments are carried out on TIMIT database, with various additive noises such as white, babble, volvo and high frequency (under various SNR levels) to compare spectro-temporal features, denoted by GBFBmel+MFCC and the proposed GBFBGamm+GFCC (incorporating mel and Gammatone filters, respectively) and the state-of-the-art MFCC features. Experiments are carried out with HTK as back end, taking into account the effectiveness of acoustic and language model. It is concluded that with acoustic modeling only, spectro-temporal Gabor filterbank (GBFB) features (whether incorporating Gammatone filterbank or mel filterbank) when concatenated with cepstral coefficients perform better than the state-of-the-art MFCC features in clean conditions as well as in the presence of various additive noises or signal degradation conditions. This is because GBFB features are able to capture more local joint spectro-temporal information, than the MFCC features, from the speech signal.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectAutomatic speech recognition
dc.subjectAcoustics in engineering
dc.classification.ddc006.54 NAG
dc.titleSpectro-temporal features based automatic speech recognition
dc.typeDissertation
dc.degreeM. Tech
dc.student.id201311022
dc.accession.numberT00517


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record