dc.description.abstract | In this thesis, design of person recognition system from their hum is discussed. The emphasis
is given to the inter-session variability of the recognition system. Standard database is not
available for the inter-session variability of humming-based person recognition systems.
Therefore, humming database of 50 subjects is collected in two training and six testing
sessions. The MFCC (Mel Frequency Cepstral Coefficients) is the state-of-the-art feature set
in the field of speech and speaker recognition systems. In this thesis, another cepstral
feature viz., VTMFCC (Variable length Teager energy based MFCC) is used along with MFCC.
VTMFCC captures the vocal source information. Two modulation-based features, viz., AM-FM
and Q-features are introduced in this thesis. The performance of all of the four features in
multi-session environment is evaluated using discriminately-trained polynomial classifier.
Polynomial classifier uses out-of-class information while creating person- specific person
model. Inter-session variability degrades the performance of person recognition systems due
to difference in training and test sessions. This variability can be classified as intrinsic
variability and extrinsic variability according to its source of origin. Inter-session variability
due to speaker’s health, aging, emotional state, etc. is called intrinsic inter-session
variability. The session variability due to environment conditions, noise, change in
microphone and acoustic channel is called extrinsic inter-session variability. The inter-session
variability degrades the performance of all four features, i.e., MFCC, VTMFCC, AM-FM and Qfeature.
The difference in % EER (Equal Error Rate) of particular test session to base test
session is used as the inter-session variability measure. The base test session is a test
session which is collected with the training session. In this thesis, two new approaches have
been proposed for the compensation of inter-session variability, viz., feature-level fusion and
model-level fusion. These two approaches reduce the degradation in the performance of person recognition system due to inter-session variability and make the system robust. | |