Show simple item record

dc.contributor.advisorPatil, Hemant A.
dc.contributor.authorSailor, Hardik Bhupendra
dc.date.accessioned2017-06-10T14:40:53Z
dc.date.available2017-06-10T14:40:53Z
dc.date.issued2013
dc.identifier.citationSailor, Hardik Bhupendra (2013). Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems. Dhirubhai Ambani Institute of Information and Communication Technology, xii, 85 p. (Acc.No: T00413)
dc.identifier.urihttp://drsr.daiict.ac.in/handle/123456789/450
dc.description.abstractSince the use of Text-to-Speech (TTS) technology is increasing, there is a high demand of TTS system that can produce natural and intelligible voice in any environments. In order to improve speech synthesis system, synthesized speech must be properly evaluated so that the gap of natural speech and synthetic speech can be identified and should be taken care by developing proper methods in each modelling block of TTS systems. This thesis addresses machine evaluation approach known as objective method for speech quality measurement of TTS voice. In this thesis work, conventional techniques for evaluating speech quality of TTS voice as well as recently proposed techniques are used. It has been shown that the conventional techniques like PESQ, spectrogram analysis are not able to justify cues related to speech naturalness. Also, experimental results show that distance-based objective measures using perceptual features, viz., Perceptual Cepstral Distance (PCD) are not appropriate for speech quality evaluation of TTS voice. In order to justify speech naturalness of synthetic speech, recently proposed method based on pitch (i.e., F0) information in speech signal is used. Since the human speech production model is difficult to apply in speech synthesis systems, pitch or fundamental frequency (F0)-related features are used and their direct correlation with subjective scores is obtained. The results on Blizzard challenge speech database shows potential of these features with correlation coefficient of 0.59, however, still it needs to be improved. For speech intelligibility, in this thesis work simple phone recognition method is developed and experiments on CMU ARCTIC data shows good correlation coefficient of -0.77 with MCD measure-generally common measure for speech quality in TTS. As a part of TTS team at DA-IICT, TTS in Gujarati language is developed so that users can be able to communicate with machine in his or her native language. All objective measures discussed in this thesis are applied and compared with subjective scores. Based on experiments, it is evident that objective measures are used only for Statistical Parametric Speech Synthesis (SPSS) system and related technologies since in unit-selection-based TTS, speech output is concatenated version of natural speech sound units.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectSpeech processing systems
dc.subjectSpeech synthesis
dc.subjectMethod
dc.subjectSpeech Quality Management
dc.classification.ddc006.454 SAI
dc.titleObjective evaluation of speech quality of text-to-speech (TTS) synthesis systems
dc.typeDissertation
dc.degreeM. Tech
dc.student.id201111037
dc.accession.numberT00413


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record