Person recognition from their hum

Madhavi, Maulik C.

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/338

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Madhavi, Maulik C.
dc.date.accessioned	2017-06-10T14:38:58Z
dc.date.available	2017-06-10T14:38:58Z
dc.date.issued	2011
dc.identifier.citation	Madhavi, Maulik C. (2011). Person recognition from their hum. Dhirubhai Ambani Institute of Information and Communication Technology, xiv, 99 p. (Acc.No: T00301)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/338
dc.description.abstract	In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work is based on using Mel filterbank-based cepstral features for person recognition task. The first task was consisted of data collection and corpus design procedure for humming. For this purpose, humming for old Hindi songs from around 170 subjects are used. Then feature extraction schemes were developed. Mel filterbank follows the human perception for hearing, so MFCC was used as state-of- the-art feature set. Then some modifications in filterbank structure were done in order to compute Gaussian Mel scalebased MFCC (GMFCC) and Inverse Mel scale-based MFCC (IMFCC) feature sets. In this thesis mainly two features are explored. First feature set captures the phase information via MFCC utilizing VTEO (Variable length Teager Energy Operator) in time-domain, i.e., MFCC-VTMP and second captures the vocal-source information called as Variable length Teager Energy Operator based MFCC, i.e., VTMFCC. The proposed feature set MFCCVTMP has two characteristics, viz., it captures phase information and other it uses the property of VTEO. VTEO is extension of TEO and it is a nonlinear energy tracking operator. Feature sets like VTMFCC captures the vocal-source information. This information exhibits the excitation mechanism in the speech (hum) production process. It is found to be having complementary nature of information than the vocal tract information. So the score-level fusion based approach of different source and system features improves the person recognition performance.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Speech perception
dc.subject	Speech processing systems
dc.subject	Speech recognition
dc.subject	Acoustic pattern recognition
dc.subject	MFCC
dc.subject	Hidden Markov Model
dc.subject	Speaker recognition
dc.subject	Text independent
dc.subject	Vector quantization
dc.subject	Gaussian mixture model
dc.subject	Pattern recognition systems
dc.subject	Biometric identification
dc.classification.ddc	006.45 MAD
dc.title	Person recognition from their hum
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	200911036
dc.accession.number	T00301
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
200911036.pdf Restricted Access		3.13 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets