Novel Pre-processing using Outlier Removal in Voice Conversion

Speech Synthesis Workshop Pub Date : 2016-09-13 DOI:10.21437/SSW.2016-22

S. Rao, Nirmesh J. Shah, H. Patil

{"title":"Novel Pre-processing using Outlier Removal in Voice Conversion","authors":"S. Rao, Nirmesh J. Shah, H. Patil","doi":"10.21437/SSW.2016-22","DOIUrl":null,"url":null,"abstract":"Voice conversion (VC) technique modiﬁes the speech utter-ance spoken by a source speaker to make it sound like a target speaker is speaking. Gaussian Mixture Model (GMM)-based VC is a state-of-the-art method. It ﬁnds the mapping function by modeling the joint density of source and target speakers using GMM to convert spectral features framewise. As with any real dataset, the spectral parameters contain a few points that are inconsistent with the rest of the data, called outliers . Until now, there has been very few literature regarding the effect of outliers in voice conversion. In this paper, we have explored the effect of outliers in voice conversion, as a pre-processing step. In order to remove these outliers, we have used the score distance, which uses the scores estimated using Robust Principal Component Analysis (ROBPCA). The outliers are determined by using a cut-off value based on the degrees of freedom in a chi-squared distribution. They are then removed from the training dataset and a GMM is trained based on the least outlying points. This pre-processing step can be applied to various methods. Experimental results indicate that there is a clear improvement in both, the objective ( 8 %) as well as the subjective ( 4 % for MOS and 5 % for XAB) results.","PeriodicalId":340820,"journal":{"name":"Speech Synthesis Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Synthesis Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SSW.2016-22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Voice conversion (VC) technique modiﬁes the speech utter-ance spoken by a source speaker to make it sound like a target speaker is speaking. Gaussian Mixture Model (GMM)-based VC is a state-of-the-art method. It ﬁnds the mapping function by modeling the joint density of source and target speakers using GMM to convert spectral features framewise. As with any real dataset, the spectral parameters contain a few points that are inconsistent with the rest of the data, called outliers . Until now, there has been very few literature regarding the effect of outliers in voice conversion. In this paper, we have explored the effect of outliers in voice conversion, as a pre-processing step. In order to remove these outliers, we have used the score distance, which uses the scores estimated using Robust Principal Component Analysis (ROBPCA). The outliers are determined by using a cut-off value based on the degrees of freedom in a chi-squared distribution. They are then removed from the training dataset and a GMM is trained based on the least outlying points. This pre-processing step can be applied to various methods. Experimental results indicate that there is a clear improvement in both, the objective ( 8 %) as well as the subjective ( 4 % for MOS and 5 % for XAB) results.

查看原文本刊更多论文

语音转换中使用离群值去除的新型预处理方法

语音转换(VC)技术是对源说话者发出的语音进行修改，使其听起来像目标说话者在说话。基于高斯混合模型(GMM)的VC是一种最新的VC方法。利用GMM对声源和目标声源的联合密度进行建模，并对频谱特征进行分帧转换，从而求出映射函数。与任何真实数据集一样，光谱参数包含一些与其他数据不一致的点，称为离群值。到目前为止，关于异常值在语音转换中的影响的文献很少。在本文中，我们探讨了异常值在语音转换中的影响，作为预处理步骤。为了去除这些异常值，我们使用了分数距离，它使用鲁棒主成分分析(ROBPCA)估计的分数。异常值是通过使用基于卡方分布中自由度的截止值确定的。然后将它们从训练数据集中移除，并基于最小离群点训练GMM。这个预处理步骤可以应用于各种方法。实验结果表明，在客观(8%)和主观(4%的MOS和5%的XAB)结果上都有明显的改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Speech Synthesis Workshop

自引率

0.00%

发文量