• Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage StatisticsView Google Analytics Statistics

    Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems

    Thumbnail
    View/Open
    201111037.pdf (2.248Mb)
    Date
    2013
    Author
    Sailor, Hardik Bhupendra
    Metadata
    Show full item record
    Abstract
    Since the use of Text-to-Speech (TTS) technology is increasing, there is a high demand of TTS system that can produce natural and intelligible voice in any environments. In order to improve speech synthesis system, synthesized speech must be properly evaluated so that the gap of natural speech and synthetic speech can be identified and should be taken care by developing proper methods in each modelling block of TTS systems. This thesis addresses machine evaluation approach known as objective method for speech quality measurement of TTS voice. In this thesis work, conventional techniques for evaluating speech quality of TTS voice as well as recently proposed techniques are used. It has been shown that the conventional techniques like PESQ, spectrogram analysis are not able to justify cues related to speech naturalness. Also, experimental results show that distance-based objective measures using perceptual features, viz., Perceptual Cepstral Distance (PCD) are not appropriate for speech quality evaluation of TTS voice. In order to justify speech naturalness of synthetic speech, recently proposed method based on pitch (i.e., F0) information in speech signal is used. Since the human speech production model is difficult to apply in speech synthesis systems, pitch or fundamental frequency (F0)-related features are used and their direct correlation with subjective scores is obtained. The results on Blizzard challenge speech database shows potential of these features with correlation coefficient of 0.59, however, still it needs to be improved. For speech intelligibility, in this thesis work simple phone recognition method is developed and experiments on CMU ARCTIC data shows good correlation coefficient of -0.77 with MCD measure-generally common measure for speech quality in TTS. As a part of TTS team at DA-IICT, TTS in Gujarati language is developed so that users can be able to communicate with machine in his or her native language. All objective measures discussed in this thesis are applied and compared with subjective scores. Based on experiments, it is evident that objective measures are used only for Statistical Parametric Speech Synthesis (SPSS) system and related technologies since in unit-selection-based TTS, speech output is concatenated version of natural speech sound units.
    URI
    http://drsr.daiict.ac.in/handle/123456789/450
    Collections
    • M Tech Dissertations [923]

    Related items

    Showing items related by title, author, creator and subject.

    • Design of syllable-based speech segmentation methods for text-to-speech (TTS) synthesis system for Gujarati 

      Talesara, Swati (Dhirubhai Ambani Institute of Information and Communication Technology, 2013)
      Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. Although there are TTS synthesizers available in English and other languages ...
    • Auditory representation learning 

      Sailor, Hardik B. (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)
      Representation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be ...
    • Gaussian mixture models for spoken language identification 

      Manwani, Naresh (Dhirubhai Ambani Institute of Information and Communication Technology, 2006)
      Language Identification (LID) is the problem of identifying the language of any spoken utterance irrespective of the topic, speaker or the duration of the speech. Although A very huge amount of work has been done for ...

    Resource Centre copyright © 2006-2017 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     


    Resource Centre copyright © 2006-2017 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV