Please use this identifier to cite or link to this item:
Title: Person identification from their hum with inter-session variability compensation
Authors: Patil, Hemant A.
Patel, Chirag R.
Keywords: Speaker
Emotion Recognition System
Speaker Recognition
Automatic Speech Recognition System
Multimodal Integration
Audio-visual recognition
Person identification
Issue Date: 2012
Publisher: Dhirubhai Ambani Institute of Information and Communication Technology
Citation: Patel, Chirag R. (2012). Person identification from their hum with inter-session variability compensation. Dhirubhai Ambani Institute of Information and Communication Technology, xiii, 70 p. (Acc.No: T00353)
Abstract: In this thesis, design of person recognition system from their hum is discussed. The emphasis is given to the inter-session variability of the recognition system. Standard database is not available for the inter-session variability of humming-based person recognition systems. Therefore, humming database of 50 subjects is collected in two training and six testing sessions. The MFCC (Mel Frequency Cepstral Coefficients) is the state-of-the-art feature set in the field of speech and speaker recognition systems. In this thesis, another cepstral feature viz., VTMFCC (Variable length Teager energy based MFCC) is used along with MFCC. VTMFCC captures the vocal source information. Two modulation-based features, viz., AM-FM and Q-features are introduced in this thesis. The performance of all of the four features in multi-session environment is evaluated using discriminately-trained polynomial classifier. Polynomial classifier uses out-of-class information while creating person- specific person model. Inter-session variability degrades the performance of person recognition systems due to difference in training and test sessions. This variability can be classified as intrinsic variability and extrinsic variability according to its source of origin. Inter-session variability due to speaker’s health, aging, emotional state, etc. is called intrinsic inter-session variability. The session variability due to environment conditions, noise, change in microphone and acoustic channel is called extrinsic inter-session variability. The inter-session variability degrades the performance of all four features, i.e., MFCC, VTMFCC, AM-FM and Qfeature. The difference in % EER (Equal Error Rate) of particular test session to base test session is used as the inter-session variability measure. The base test session is a test session which is collected with the training session. In this thesis, two new approaches have been proposed for the compensation of inter-session variability, viz., feature-level fusion and model-level fusion. These two approaches reduce the degradation in the performance of person recognition system due to inter-session variability and make the system robust.
Appears in Collections:M Tech Dissertations

Files in This Item:
File Description SizeFormat 
  Restricted Access
1.93 MBAdobe PDFThumbnail
View/Open Request a copy

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.