Design of syllable-based speech segmentation methods for text-to-speech (TTS) synthesis system for Gujarati

Talesara, Swati

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/458

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Talesara, Swati
dc.date.accessioned	2017-06-10T14:41:01Z
dc.date.available	2017-06-10T14:41:01Z
dc.date.issued	2013
dc.identifier.citation	Talesara, Swati (2013). Design of syllable-based speech segmentation methods for text-to-speech (TTS) synthesis system for Gujarati. Dhirubhai Ambani Institute of Information and Communication Technology, xii, 79 p. (Acc.No: T00421)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/458
dc.description.abstract	Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. Although there are TTS synthesizers available in English and other languages as well, however, it has been observed that people feel more comfortable in learning in their own native language. Keeping this point in mind, Gujarati TTS synthesizer has been built and the building process has been described in this thesis. This TTS system has been built in Festival speech synthesis framework. Syllable is taken as the basic speech sound unit in building Gujarati TTS synthesizer as Indian languages are syllabic in nature. However, in building this unit-selection-based Gujarati TTS system, one requires large Gujarati labeled corpus. This task of labeling is the most time-consuming and tedious task. This task requires large manual efforts. In this thesis work, an attempt has been made to reduce these manual efforts by automatically generating labeled corpus at syllable-level. To that effect, a Gaussian-based segmentation method has been proposed for automatic segmentation of speech at syllable-level. It has been observed that the percentage correctness of labeled data is around 80 % for both male and female voice as compared to 70 % for group delay-based labeling. In addition, the system built on the proposed approach shows better intelligibility when evaluated by a visually challenged subject. The word error rate is reduced by 5 % for Gaussian filter-based TTS system, compared to group delay-based TTS system. Furthermore, 5 % increment is observed in correctly synthesized words. The main focus of this thesis has been to reduce the manual efforts required in building TTS system (which are the manual efforts required in labeling speech data) for Gujarati language.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Speech synthesis
dc.subject	Text-to-Speech
dc.subject	Speech processing systems
dc.subject	Syllable Based Speech Segmentation
dc.classification.ddc	621.38 TAL
dc.title	Design of syllable-based speech segmentation methods for text-to-speech (TTS) synthesis system for Gujarati
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201111046
dc.accession.number	T00421
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201111046.pdf Restricted Access		2.63 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets