Vowel landmark detection for speech recognition

Undhad, Ankur G.

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/514

Title:	Vowel landmark detection for speech recognition
Authors:	Patil, Hemant A. Undhad, Ankur G.
Keywords:	Speech recognition speech Recognition Landmark Vowel Landmark Detection
Issue Date:	2014
Publisher:	Dhirubhai Ambani Institute of Information and Communication Technology
Citation:	Undhad, Ankur G. (2014). Vowel landmark detection for speech recognition. Dhirubhai Ambani Institute of Information and Communication Technology, xviii, 89 p. (Acc.No: T00477)
Abstract:	Landmarks are the time instants in a speech utterance which marks the important events such as vowels, glides and consonants. This thesis proposes a novel Vowel Landmark Detection (VLD) algorithm to locate vowel landmarks and hence the nucleus of a vowel. VLD can find its potential application for Automatic Speech Recognition (ASR) and Automatic Phonetic Segmentation (APS). The proposed VLD method uses speech source information to detect the vowel landmarks which are points of high sonority. The excitation peaks in Hilbert envelope (HE) of Teager energy profile of zero frequency filtered (ZFF) speech signal can be interpreted as perceptually significant feature which contribute to the loudness. The performance of proposed VLD method is compared with existing loudness-based method. The results are reported on TIMIT and NTIMIT corpora. The proposed VLD algorithm has detection rate of 85.48 % (83.97 %) which is 5.06 % (7.51 %) more as compared to existing loudness-based method for TIMIT (NTIMIT) corpus, respectively. In addition, this thesis proposes use of VLD algorithm for low resource languages, viz., Gujarati and Marathi, Indian languages. The results are reported on speech recorded in three different modes, viz., read, spontaneous and lecture followed by manual phonetic transcription by the transcribers (to be used as ground truth) for Gujarati as well as Marathi. The proposed VLD algorithm has detection rate of 78.92 %, 76.40 % and 73.89 %, which has jump of 8.79 %, 7.23 % and 7.17 % more as compared to loudness-based method in lecture, spontaneous and read mode, respectively for Gujarati. Similarly, the proposed VLD algorithm has detection rate of 76.93 %, 75.16 % and 73.93 %, which has jump of 7.52 %, 7.43 % and 7.82 % more as compared to loudness-based method in lecture, spontaneous and read mode, respectively (for Marathi). The proposed algorithm is also shown to be robust against signal degradation such as white noise. The second part of the thesis is to recognize the detected vowel landmarks.Formant-based technique is used to recognize the detected vowels. The results are reported on phonetically transcribed TIMIT corpus. The recognition rate is 32.16 % on the correctly detected vowels (i.e., out of 78374 vowels, 66994 number of vowels are detected correctly and out of that 21545 vowels are recognized). Proposed method is very fast and requires no training.
URI:	http://drsr.daiict.ac.in/handle/123456789/514
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201211049.pdf Restricted Access		1.98 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets