Person recognition using humming, singing and speech

Chhayani, Nirav Hitendrabhai

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Chhayani, Nirav Hitendrabhai
dc.date.accessioned	2017-06-10T14:40:46Z
dc.date.available	2017-06-10T14:40:46Z
dc.date.issued	2013
dc.identifier.citation	Chhayani, Nirav Hitendrabhai (2013). Person recognition using humming, singing and speech. Dhirubhai Ambani Institute of Information and Communication Technology, xiv, 72 p. (Acc.No: T00407)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/444
dc.description.abstract	In this thesis, person recognition system is designed for three different speech-related biometric signals, i.e., humming, singing and normal speech. As humming is nasalised sound, we have approached Mel filterbank-based features for person recognition task rather than LP (Linear Prediction) model. This thesis work is done to observe which biometric pattern performs better amongst three for person recognition task. As we found that person-specific information is not same in any two biometric signals, one should have to observe performance of these biometric signals. The very first task for any person recognition system design is data collection and corpus design. Hence, in this thesis, first, corpus is designed for the humming, singing and speech. In the data collection part, total 50 subjects are selected for the recording purpose. The data collection is done in 4 different sessions for each subject in order to capture intersession variability. Each session consists of testing session of recording for humming, singing and speech. Next to data collection, feature extraction is done with Mel filterbank which follows the human perception for hearing, so Mel Frequency Cepstral Coefficients (MFCC) is used as state-of-the-art feature set. Then using this filterbank, experiment is done for intersession as well as session training-testing set. After that, noise is added to the database and the results are compared to observe the effect of noise viz., evaluation under noisy conditions in robustness performance of the system. Then some modification is also done in feature (Teager Energy Based MFCC) extraction process using TEO (Teager Energy Operator). Results of T-MFCC features are also compared with the results of MFCC feature set. Score-level fusion of T-MFCC and MFCC feature set are also done and results for the same are observed. These observations lead us to the fact that score-level fusion of MFCC and T-MFCC performs better than either of them two individually. This type of score-level fusion increases the performance of the system. For different values of the fusion weight, performance is measured and optimum value for fusion-weight is determined for humming, singing and speech signals. Effect of feature dimensions as well as order of the classifier also observed for intersession experiment. After these studies, inter biometric type experiment is performed. Based on the results obtained in this experiment, Fisher’s F-ratio is determined for all three biometric patterns (i.e., humming, singing and speech). The new structure of filterbank is proposed for all three biometric patterns. The system performance is also measured for this new filterbank and compared with previous all experiments. In all these experiments, person-specific model is generated using polynomial classifier. This classifier considers out-of-class information while creating person-specific model. The experiments were reported for different performance evaluation factors. For example, effect of polynomial classifier order, effect of dimension of feature vector, effect of noisy environments are considered. To evaluate, performance DET curves are used. This is NIST standardized widely accepted performance evaluation measure for speaker recognition application.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Biometric Signals
dc.subject	Person Recognition
dc.subject	Speech Humming
dc.subject	Singing Recognition
dc.subject	Speech processing systems
dc.classification.ddc	006.454 CHH
dc.title	Person recognition using humming, singing and speech
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201111031
dc.accession.number	T00407

Files in this item

Name:: 201111031.pdf
Size:: 2.469Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record