Pre-processing using outlier removal in voice conversion
Voice conversion (VC) is a technique that modifies and converts the speech spokenby one speaker to sound as if the same sentence was spoken by another speaker.In short, only the speaker�s identity is converted and the linguistic informationfrom the source speaker remains unchanged. There are numerous methods forVC that have their own strengths and limitations. In this thesis, the problem thatis being dealt with is that of improving the quality of training by proposing a preprocessingmethod to remove the undesired observations. An attempt is made tosuccessfully make the training phase of VC systems, robust by eliminating suchobservations before estimating a mapping function. In particular, for this work,the two state-of-the-art statistical mapping techniques were implemented to testand compare the performance of the proposed approach. Voice conversion usingJoint Density Gaussian Mixture Models (JD-GMM) and Partial Least Squares(PLS) regression were used as the mapping techniques. These undesired observationsare known as outliers. By definition, outliers are observations (frames inspeech signal processing) that do not fit within the regularity of the datatset. Theconcept and effect of outliers will be further studied in this thesis. To evaluate VCconversion systems, there are a set of standard objective and subjective measuresthat are used. The performance of the VC systems is compared based on both thestandard set of measures.
- M Tech Dissertations