Post-processing of speech signal for prosody modification and improvement

Dhoot, Kuldeep

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Dhoot, Kuldeep
dc.date.accessioned	2017-06-10T14:42:10Z
dc.date.available	2017-06-10T14:42:10Z
dc.date.issued	2014
dc.identifier.citation	Dhoot, Kuldeep (2014). Post-processing of speech signal for prosody modification and improvement. Dhirubhai Ambani Institute of Information and Communication Technology, xvi, 90 p. (Acc.No: T00478)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/515
dc.description.abstract	The basic task of a text-to-speech (TTS) synthesis system is to obtain the correct synthetic speech signal with the help of machines corresponding to the given input text. However, the main difficulty with the TTS system is the problem of appropriate prosody in the resultant speech signal. In this thesis, we used the methods based on the pitch synchronous overlap-add (PSOLA) technique, i.e., time-domain PSOLA (TD-PSOLA) and linear prediction PSOLA (LP-PSOLA), which tries to use the combination of different pitch-scale and time-scale combination to match the synthesized speech to the natural speech. To implement the PSOLA techniques, different pitch detection algorithms are employed in order to obtain the pitch marks and pitch contour. Pitch marking is essential task to obtain the required time-scale and pitch-scale modifications. Pitch detection algorithms based on autocorrelation function (ACF), normalized cross-correlation function (NCCF) and zero frequency resonator (ZER) are employed in this thesis. Firstly, we applied the PSOLA methods to the unit selection synthesis (USS) and Hidden Markov model-based TTS (HTS) based synthesized speech for which we were having the prior knowledge of natural speech corresponding to the synthesized speech. Later, we performed the method on the Blizzard Challenge-2012 speech corpus for which we were not having the database of corresponding natural signal. PSOLA method is also applied only on the natural speech for time-scale and pitch-scale modifications. Time-scale modification of natural speech have many real world applications speech, a series of tests are then performed to determine the effectiveness of the PSOLA methods.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Signal Processing
dc.subject	Digital Techniques
dc.subject	Speech Signal Processing
dc.subject	Signals Processing
dc.classification.ddc	621.382 DHO
dc.title	Post-processing of speech signal for prosody modification and improvement
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201211050
dc.accession.number	T00478

Files in this item

Name:: 201211050.pdf
Size:: 5.790Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record