• Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage StatisticsView Google Analytics Statistics

    Design of syllable-based speech segmentation methods for text-to-speech (TTS) synthesis system for Gujarati

    Thumbnail
    View/Open
    201111046.pdf (2.570Mb)
    Date
    2013
    Author
    Talesara, Swati
    Metadata
    Show full item record
    Abstract
    Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. Although there are TTS synthesizers available in English and other languages as well, however, it has been observed that people feel more comfortable in learning in their own native language. Keeping this point in mind, Gujarati TTS synthesizer has been built and the building process has been described in this thesis. This TTS system has been built in Festival speech synthesis framework. Syllable is taken as the basic speech sound unit in building Gujarati TTS synthesizer as Indian languages are syllabic in nature. However, in building this unit-selection-based Gujarati TTS system, one requires large Gujarati labeled corpus. This task of labeling is the most time-consuming and tedious task. This task requires large manual efforts. In this thesis work, an attempt has been made to reduce these manual efforts by automatically generating labeled corpus at syllable-level. To that effect, a Gaussian-based segmentation method has been proposed for automatic segmentation of speech at syllable-level. It has been observed that the percentage correctness of labeled data is around 80 % for both male and female voice as compared to 70 % for group delay-based labeling. In addition, the system built on the proposed approach shows better intelligibility when evaluated by a visually challenged subject. The word error rate is reduced by 5 % for Gaussian filter-based TTS system, compared to group delay-based TTS system. Furthermore, 5 % increment is observed in correctly synthesized words. The main focus of this thesis has been to reduce the manual efforts required in building TTS system (which are the manual efforts required in labeling speech data) for Gujarati language.
    URI
    http://drsr.daiict.ac.in/handle/123456789/458
    Collections
    • M Tech Dissertations [923]

    Related items

    Showing items related by title, author, creator and subject.

    • Auditory representation learning 

      Sailor, Hardik B. (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)
      Representation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be ...
    • Gaussian mixture models for spoken language identification 

      Manwani, Naresh (Dhirubhai Ambani Institute of Information and Communication Technology, 2006)
      Language Identification (LID) is the problem of identifying the language of any spoken utterance irrespective of the topic, speaker or the duration of the speech. Although A very huge amount of work has been done for ...
    • Objective evaluation of speech quality of text-to-speech (TTS) synthesis systems 

      Sailor, Hardik Bhupendra (Dhirubhai Ambani Institute of Information and Communication Technology, 2013)
      Since the use of Text-to-Speech (TTS) technology is increasing, there is a high demand of TTS system that can produce natural and intelligible voice in any environments. In order to improve speech synthesis system, synthesized ...

    Resource Centre copyright © 2006-2017 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     


    Resource Centre copyright © 2006-2017 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV