A novel approach to remove outliers for parallel voice conversion

Shah, Nirmesh J; Patil, Hemant

Publication:
A novel approach to remove outliers for parallel voice conversion

dc.contributor.affiliation	DA-IICT, Gandhinagar
dc.contributor.author	Shah, Nirmesh J
dc.contributor.author	Patil, Hemant
dc.contributor.researcher	Shah, Nirmesh J (201321009)
dc.date.accessioned	2025-08-01T13:09:01Z
dc.date.issued	01-11-2019
dc.description.abstract	Alignment is a key step before learning a�mapping function�between a source and a target speaker�s�spectral features�in various state-of-the-art parallel data Voice Conversion (VC) techniques. After alignment, some corresponding pairs are still inconsistent with the rest of the data and are considered outliers. These outliers shift the parameters of the mapping function from their true value and hence, negatively affect the learning of mapping function during the training phase of the VC task. To the best of the authors� knowledge, the effect of outliers (and hence, their removal) on quality of the converted voice has not been much explored in the VC literature. Recent research has shown the effectiveness of the�outlier removal�as a pre-processing step in the VC. In this paper, we extend this study with a detailed theory and analysis. The proposed method uses a score distance that is estimated using Robust�Principal Component�Analysis (ROBPCA) to detect the outliers. In particular, the outliers are determined using a fixed cut-off on the score distances, based on the degrees of freedom in a chi-squared distribution, which is speaker-pair independent. The fixed cut-off is due to the assumption that the score distances follow the normal (i.e., Gaussian) distribution. However, this is a�weak�statistical assumption even in the cases where quite many samples are available. Hence, in this paper, we propose to explore speaker-pair dependent cut-offs to detect the outliers. In addition, we have presented our results on two state-of-the-art databases, namely, CMU-ARCTIC and Voice Conversion Challenge (VCC) 2016 by developing various state-of-the-art methods in the VC. In particular, we have presented the effectiveness of the outlier removal on�Gaussian Mixture Model�(GMM),�Artificial Neural Network�(ANN), and�Deep Neural Network�(DNN)-based VC techniques. Furthermore, we have presented subjective and objective evaluations using a 95% confidence interval for the statistical significance of the tests. We obtained an average 0.56% relative reduction in Mel�Cepstral�Distortion (MCD) with the proposed outlier removal approach as a pre-processing step. In particular, with the proposed speaker-pair dependent cut-off, we have observed relative improvement of 12.24% and 30.51% in the speech quality, and 39.7% and 4.27% absolute improvement in the speaker similarity for the CMU-ARCTIC and the VCC 2016, respectively.
dc.format.extent	127-152
dc.identifier.citation	Nirmesh J. Shah, and Patil, Hemant A, "A novel approach to remove outliers for parallel voice conversion," Computer Speech & Language, vol. 58, Nov. 2019, pp. 127-152. doi: 10.1016/j.csl.2019.03.009
dc.identifier.doi	10.1016/j.csl.2019.03.009
dc.identifier.issn	1095-8363
dc.identifier.scopus	2-s2.0-85065090873
dc.identifier.uri	https://ir.daiict.ac.in/handle/dau.ir/1548
dc.identifier.wos	WOS:000477663800007
dc.language.iso	en
dc.publisher	Elsevier
dc.relation.ispartofseries	Vol. 58; No. C
dc.source	Computer Speech & Language
dc.source.uri	https://www.sciencedirect.com/science/article/pii/S0885230818300299?via%3Dihub
dc.title	A novel approach to remove outliers for parallel voice conversion
dspace.entity.type	Publication
relation.isAuthorOfPublication	fdb7041b-280e-498b-b2ee-34f9bc351f4c
relation.isAuthorOfPublication.latestForDiscovery	fdb7041b-280e-498b-b2ee-34f9bc351f4c

Collections

Journal Article

Publication: A novel approach to remove outliers for parallel voice conversion

Files

Collections

Publication:
A novel approach to remove outliers for parallel voice conversion