Studies on transcription, classification and detection of obstruents

Malde, Kewal Dhiraj

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/457

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Malde, Kewal Dhiraj
dc.date.accessioned	2017-06-10T14:41:00Z
dc.date.available	2017-06-10T14:41:00Z
dc.date.issued	2013
dc.identifier.citation	Malde, Kewal Dhiraj (2013). Studies on transcription, classification and detection of obstruents. Dhirubhai Ambani Institute of Information and Communication Technology, xxxiv, 148 p. (Acc.No: T00420)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/457
dc.description.abstract	Speech is the powerful mode of communication among the people. During the last few decades, there has been growing interest in speech related research all over the world. To develop algorithms for automatic speech recognition (ASR) systems, the requirement of independence of the language and accent are one of the important aspects. Hence, ASR based on automatic phonetic transcription (which is independent of language, accent and the speaker) is a better idea. In this thesis, two objectives, viz., obstruent classification and obstruent detection are explored. In order to get the basic concepts clear related to obstruents (i.e., consonants which are produced by obstructing the airflow either completely or partially), various aspects related to speech are discussed in detail. QAs one of the important requirements for any experiment is the database, hence, details related to standard speech database (viz., TIMIT, used by most of the researchers all over the globe) are discussed. In addition, details of the speech corpora being developed by our (DA-IICT Prosody research) team in two of the Indian languages (viz., Gujarati and Marathi) are also discussed. Phonetic transcription being the core application of the present research work done in this thesis, it is given special importance and explained in detail. All the experiments are performed on TIMIT database as well as speech database in two Indian languages (viz., Gujarati and Marathi languages) which are being developed by DA-IICT DeitY Prosody research team. Experiments for obstruent classification task are performed based on the general method using modulation spectrogram-based features. Experiments for obstruent detection are performed using three methods based on STM contour, chaotic titration and Seneff’s auditory model. As compared to our own developed database, we get consistent classification results (i.e., classification of obstruents, stops and fricatives) using TIMIT database. The reason for this is that TIMIT database has been developed by expert phoneticians whereas our database is not. Also as our database is under development, hence, there were less numbers of phoneme samples (i.e., phonetic transcribed data). Due to less number of transcribed data (from our own database in Gujarati and Marathi languages) along with less variation in accent across speakers, results of classification obtained are better for our own database as compared to TIIMIT database (wherein there is huge variation in accent across speakers). We obtained good classification accuracy (i.e., around 90-99 %) using an optimum feature size of (modulation spectrogram-based feature obtained after feature reduction using HOSVD), 75:25 % training to testing ratio. For obstruent detection task, we obtained 77 %, 99.61 % and 97.61 % of detection efficiency of obstruents using methods based on STM contour, chaotic titration and Seneff’s auditory model (SAM), respectively. Results obtained using the latter two methods is better as compared to the STM contour at the cost of decrease in estimated probability. From the present work, we can say that, modulation spectrogram-based features can be a good option for obstruent classification, however, we need some dimension reduction methods to reduce the size for the feature vector obtained based from modulation spectrogram. As the present work of classification was based on isolated obstruent speech segment, one can extend this work for continuous speech. Seneff’s auditory model with certain modification and proper selection of parametric constant can be used for obstruent detection task.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Speech processing systems
dc.subject	Sound Processing
dc.subject	Voice Recognition
dc.classification.ddc	621.3828 MAL
dc.title	Studies on transcription, classification and detection of obstruents
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201111045
dc.accession.number	T00420
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201111045.pdf Restricted Access		4.4 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets