Improving the performance of MGM-based voice conversion by preparing training data method

Guoyu Zuo, Wenju Liu, Xiaogang Ruan
{"title":"Improving the performance of MGM-based voice conversion by preparing training data method","authors":"Guoyu Zuo, Wenju Liu, Xiaogang Ruan","doi":"10.1109/CHINSL.2004.1409616","DOIUrl":null,"url":null,"abstract":"This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral feature representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/U-V) decision scheme for time-alignment is provided to obtain the right data for training the MGM function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods, and V/UV decision strategies, to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes an approach to improve both the target speaker's individuality and the quality of the converted speech by preparing the training data. In mixture Gaussian spectral mapping (MGM) based voice conversion, spectral feature representations are analyzed to obtain the right feature associations between the source and target characteristics. A voiced and unvoiced (V/U-V) decision scheme for time-alignment is provided to obtain the right data for training the MGM function while removing the misaligned data. Experiments are conducted in terms of the applications of spectral representation methods, and V/UV decision strategies, to the MGM functions. When linear predictive cepstral coefficients (LPCC) are used for time-alignment and the V/UV decisions are adopted for removing bad data, results show that the conversion function can get a better accuracy and the proposed method can effectively improve the overall performance of voice conversion.
通过准备训练数据的方法提高基于mgm的语音转换性能
本文提出了一种通过训练数据的准备来提高目标说话人的个性和转换后的语音质量的方法。在基于混合高斯频谱映射(MGM)的语音转换中,分析了频谱特征表示,以获得源和目标特征之间正确的特征关联。提出了一种浊音和浊音(V/U-V)时间对齐决策方案,在去除不对齐数据的同时,获得训练MGM函数所需的正确数据。对光谱表示方法和V/UV决策策略在MGM函数中的应用进行了实验。采用线性预测倒谱系数(LPCC)进行时间对准,采用V/UV决策去除不良数据,结果表明,该转换函数具有较好的精度,有效地提高了语音转换的整体性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信