dc.description.abstract | Accurate estimation of source excitation features is important in many speech analysis-synthesis applications. According to source-filter theory, separating source features from speech signal is basically a deconvolution problem [1]. For voiced speech, the vocal tract is excited with a sequence of impulse-like glottal pulses. The extent of excitation is significant around these pulses. The work presented in this thesis is aimed at estimating the instants of significant excitation of vocal tract system which occur at glottal closure instants (GCIs), also known as epochs [2]. Unlike the conventional methods, we have proposed a method which does not require the modelling of vocal tract system for epoch estimation and thus, does not use the traditional linear prediction residual (LPR).
The proposed epoch extraction method uses lowpass filter on the positively clipped and negated speech signal followed by peak detection. The method assumes the quasiperiodicity of speech signal. The lowpass filtering removes the vocal tract characteristics from the speech signal and a peak detection method is employed to detect the epoch candidates. The method has been evaluated over a phonetically balanced database and compared with the other state-of-the art methods, viz., Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Zero Frequency Resonator (ZFR)-based method. The proposed method gave comparable or better results on clean as well as noisy speech signals.
In addition, using the estimated epoch locations, we proposed an event-based approach for pitch estimation. In this work, we have also presented an approach to evaluate the performance of a pitch estimation algorithm in the absence of ground truth. The proposed pitch estimation approach has been compared with other state-of-the-art pitch extraction methods, in the framework of voice conversion. | |