dc.description.abstract | Speech is the powerful mode of communication among the people. During the last few decades, there has been growing interest in speech related research all over the world. To develop algorithms for automatic speech recognition (ASR) systems, the requirement of independence of the language and accent are one of the important aspects. Hence, ASR based on automatic phonetic transcription (which is independent of language, accent and the speaker) is a better idea.
In this thesis, two objectives, viz., obstruent classification and obstruent detection are explored. In order to get the basic concepts clear related to obstruents (i.e., consonants which are produced by obstructing the airflow either completely or partially), various aspects related to speech are discussed in detail. QAs one of the important requirements for any experiment is the database, hence, details related to standard speech database (viz., TIMIT, used by most of the researchers all over the globe) are discussed. In addition, details of the speech corpora being developed by our (DA-IICT Prosody research) team in two of the Indian languages (viz., Gujarati and Marathi) are also discussed. Phonetic transcription being the core application of the present research work done in this thesis, it is given special importance and explained in detail.
All the experiments are performed on TIMIT database as well as speech database in two Indian languages (viz., Gujarati and Marathi languages) which are being developed by DA-IICT DeitY Prosody research team. Experiments for obstruent classification task are performed based on the general method using modulation spectrogram-based features. Experiments for obstruent detection are performed using three methods based on STM contour, chaotic titration and Seneff’s auditory model.
As compared to our own developed database, we get consistent classification results (i.e., classification of obstruents, stops and fricatives) using TIMIT database. The reason for this is that TIMIT database has been developed by expert phoneticians whereas our database is not. Also as our database is under development, hence, there were less numbers of phoneme samples (i.e., phonetic transcribed data). Due to less number of transcribed data (from our own database in Gujarati and Marathi languages) along with less variation in accent across speakers, results of classification obtained are better for our own database as compared to TIIMIT database (wherein there is huge variation in accent across speakers).
We obtained good classification accuracy (i.e., around 90-99 %) using an optimum feature size of (modulation spectrogram-based feature obtained after feature reduction using HOSVD), 75:25 % training to testing ratio. For obstruent detection task, we obtained 77 %, 99.61 % and 97.61 % of detection efficiency of obstruents using methods based on STM contour, chaotic titration and Seneff’s auditory model (SAM), respectively. Results obtained using the latter two methods is better as compared to the STM contour at the cost of decrease in estimated probability.
From the present work, we can say that, modulation spectrogram-based features can be a good option for obstruent classification, however, we need some dimension reduction methods to reduce the size for the feature vector obtained based from modulation spectrogram. As the present work of classification was based on isolated obstruent speech segment, one can extend this work for continuous speech. Seneff’s auditory model with certain modification and proper selection of parametric constant can be used for obstruent detection task. | |