Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems

Sailor, Hardik Bhupendra

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/450

Title:	Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems
Authors:	Patil, Hemant A. Sailor, Hardik Bhupendra
Keywords:	Speech processing systems Speech synthesis Method Speech Quality Management
Issue Date:	2013
Publisher:	Dhirubhai Ambani Institute of Information and Communication Technology
Citation:	Sailor, Hardik Bhupendra (2013). Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems. Dhirubhai Ambani Institute of Information and Communication Technology, xii, 85 p. (Acc.No: T00413)
Abstract:	Since the use of Text-to-Speech (TTS) technology is increasing, there is a high demand of TTS system that can produce natural and intelligible voice in any environments. In order to improve speech synthesis system, synthesized speech must be properly evaluated so that the gap of natural speech and synthetic speech can be identified and should be taken care by developing proper methods in each modelling block of TTS systems. This thesis addresses machine evaluation approach known as objective method for speech quality measurement of TTS voice. In this thesis work, conventional techniques for evaluating speech quality of TTS voice as well as recently proposed techniques are used. It has been shown that the conventional techniques like PESQ, spectrogram analysis are not able to justify cues related to speech naturalness. Also, experimental results show that distance-based objective measures using perceptual features, viz., Perceptual Cepstral Distance (PCD) are not appropriate for speech quality evaluation of TTS voice. In order to justify speech naturalness of synthetic speech, recently proposed method based on pitch (i.e., F0) information in speech signal is used. Since the human speech production model is difficult to apply in speech synthesis systems, pitch or fundamental frequency (F0)-related features are used and their direct correlation with subjective scores is obtained. The results on Blizzard challenge speech database shows potential of these features with correlation coefficient of 0.59, however, still it needs to be improved. For speech intelligibility, in this thesis work simple phone recognition method is developed and experiments on CMU ARCTIC data shows good correlation coefficient of -0.77 with MCD measure-generally common measure for speech quality in TTS. As a part of TTS team at DA-IICT, TTS in Gujarati language is developed so that users can be able to communicate with machine in his or her native language. All objective measures discussed in this thesis are applied and compared with subjective scores. Based on experiments, it is evident that objective measures are used only for Statistical Parametric Speech Synthesis (SPSS) system and related technologies since in unit-selection-based TTS, speech output is concatenated version of natural speech sound units.
URI:	http://drsr.daiict.ac.in/handle/123456789/450
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201111037.pdf Restricted Access		2.3 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets