Vowel landmark detection for speech recognition

Undhad, Ankur G.

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Undhad, Ankur G.
dc.date.accessioned	2017-06-10T14:42:08Z
dc.date.available	2017-06-10T14:42:08Z
dc.date.issued	2014
dc.identifier.citation	Undhad, Ankur G. (2014). Vowel landmark detection for speech recognition. Dhirubhai Ambani Institute of Information and Communication Technology, xviii, 89 p. (Acc.No: T00477)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/514
dc.description.abstract	Landmarks are the time instants in a speech utterance which marks the important events such as vowels, glides and consonants. This thesis proposes a novel Vowel Landmark Detection (VLD) algorithm to locate vowel landmarks and hence the nucleus of a vowel. VLD can find its potential application for Automatic Speech Recognition (ASR) and Automatic Phonetic Segmentation (APS). The proposed VLD method uses speech source information to detect the vowel landmarks which are points of high sonority. The excitation peaks in Hilbert envelope (HE) of Teager energy profile of zero frequency filtered (ZFF) speech signal can be interpreted as perceptually significant feature which contribute to the loudness. The performance of proposed VLD method is compared with existing loudness-based method. The results are reported on TIMIT and NTIMIT corpora. The proposed VLD algorithm has detection rate of 85.48 % (83.97 %) which is 5.06 % (7.51 %) more as compared to existing loudness-based method for TIMIT (NTIMIT) corpus, respectively. In addition, this thesis proposes use of VLD algorithm for low resource languages, viz., Gujarati and Marathi, Indian languages. The results are reported on speech recorded in three different modes, viz., read, spontaneous and lecture followed by manual phonetic transcription by the transcribers (to be used as ground truth) for Gujarati as well as Marathi. The proposed VLD algorithm has detection rate of 78.92 %, 76.40 % and 73.89 %, which has jump of 8.79 %, 7.23 % and 7.17 % more as compared to loudness-based method in lecture, spontaneous and read mode, respectively for Gujarati. Similarly, the proposed VLD algorithm has detection rate of 76.93 %, 75.16 % and 73.93 %, which has jump of 7.52 %, 7.43 % and 7.82 % more as compared to loudness-based method in lecture, spontaneous and read mode, respectively (for Marathi). The proposed algorithm is also shown to be robust against signal degradation such as white noise. The second part of the thesis is to recognize the detected vowel landmarks.Formant-based technique is used to recognize the detected vowels. The results are reported on phonetically transcribed TIMIT corpus. The recognition rate is 32.16 % on the correctly detected vowels (i.e., out of 78374 vowels, 66994 number of vowels are detected correctly and out of that 21545 vowels are recognized). Proposed method is very fast and requires no training.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Speech recognition
dc.subject	speech Recognition Landmark
dc.subject	Vowel Landmark Detection
dc.classification.ddc	621.3822 UND
dc.title	Vowel landmark detection for speech recognition
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201211049
dc.accession.number	T00477

Files in this item

Name:: 201211049.pdf
Size:: 1.937Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record