Improving the performance of MGM-based voice conversion by preparing training data method

2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI:10.1109/CHINSL.2004.1409616

Guoyu Zuo, Wenju Liu, Xiaogang Ruan

引用次数: 0

Abstract

This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral feature representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/U-V) decision scheme for time-alignment is provided to obtain the right data for training the MGM function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods, and V/UV decision strategies, to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.

查看原文本刊更多论文

通过准备训练数据的方法提高基于mgm的语音转换性能

本文提出了一种通过训练数据的准备来提高目标说话人的个性和转换后的语音质量的方法。在基于混合高斯频谱映射(MGM)的语音转换中，分析了频谱特征表示，以获得源和目标特征之间正确的特征关联。提出了一种浊音和浊音(V/U-V)时间对齐决策方案，在去除不对齐数据的同时，获得训练MGM函数所需的正确数据。对光谱表示方法和V/UV决策策略在MGM函数中的应用进行了实验。采用线性预测倒谱系数(LPCC)进行时间对准，采用V/UV决策去除不良数据，结果表明，该转换函数具有较好的精度，有效地提高了语音转换的整体性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2004 International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量