Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems

Sailor, Hardik Bhupendra

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/450

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Sailor, Hardik Bhupendra
dc.date.accessioned	2017-06-10T14:40:53Z
dc.date.available	2017-06-10T14:40:53Z
dc.date.issued	2013
dc.identifier.citation	Sailor, Hardik Bhupendra (2013). Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems. Dhirubhai Ambani Institute of Information and Communication Technology, xii, 85 p. (Acc.No: T00413)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/450
dc.description.abstract	Since the use of Text-to-Speech (TTS) technology is increasing, there is a high demand of TTS system that can produce natural and intelligible voice in any environments. In order to improve speech synthesis system, synthesized speech must be properly evaluated so that the gap of natural speech and synthetic speech can be identified and should be taken care by developing proper methods in each modelling block of TTS systems. This thesis addresses machine evaluation approach known as objective method for speech quality measurement of TTS voice. In this thesis work, conventional techniques for evaluating speech quality of TTS voice as well as recently proposed techniques are used. It has been shown that the conventional techniques like PESQ, spectrogram analysis are not able to justify cues related to speech naturalness. Also, experimental results show that distance-based objective measures using perceptual features, viz., Perceptual Cepstral Distance (PCD) are not appropriate for speech quality evaluation of TTS voice. In order to justify speech naturalness of synthetic speech, recently proposed method based on pitch (i.e., F0) information in speech signal is used. Since the human speech production model is difficult to apply in speech synthesis systems, pitch or fundamental frequency (F0)-related features are used and their direct correlation with subjective scores is obtained. The results on Blizzard challenge speech database shows potential of these features with correlation coefficient of 0.59, however, still it needs to be improved. For speech intelligibility, in this thesis work simple phone recognition method is developed and experiments on CMU ARCTIC data shows good correlation coefficient of -0.77 with MCD measure-generally common measure for speech quality in TTS. As a part of TTS team at DA-IICT, TTS in Gujarati language is developed so that users can be able to communicate with machine in his or her native language. All objective measures discussed in this thesis are applied and compared with subjective scores. Based on experiments, it is evident that objective measures are used only for Statistical Parametric Speech Synthesis (SPSS) system and related technologies since in unit-selection-based TTS, speech output is concatenated version of natural speech sound units.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Speech processing systems
dc.subject	Speech synthesis
dc.subject	Method
dc.subject	Speech Quality Management
dc.classification.ddc	006.454 SAI
dc.title	Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201111037
dc.accession.number	T00413
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201111037.pdf Restricted Access		2.3 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets