Comprehensive Voice Conversion Analysis Based on DGMM and Feature Combination

He Pan, Yangjie Wei, Nan Guan, Yi Wang
{"title":"Comprehensive Voice Conversion Analysis Based on DGMM and Feature Combination","authors":"He Pan, Yangjie Wei, Nan Guan, Yi Wang","doi":"10.1109/AMS.2014.39","DOIUrl":null,"url":null,"abstract":"Voice conversion system modifies a speaker's voice to be perceived as another speaker uttered, and now it is widely used in many real applications. However, most research only focuses on one aspect performance of voice conversion system, rare theoretical analysis and experimental comparison on the whole source-target speaker voice conversion process has been introduced. Therefore, in this paper, a comprehensive analysis on source-target speaker voice conversion is conducted based on three key steps, including acoustic features selection and extraction, voice conversion model construction, and target speech synthesis, and a complete and optimal source-target speaker voice conversion is proposed. First, a comprehensive feature combination form consisting of prosodic feature, spectrum parameter and spectral envelope characteristic, is proposed. Then, to void the discontinuity and spectrum distortion of a converted speech, DGMM (Dynamic Gaussian Mixture Model) considering dynamic information between frames is presented. Subsequently, for speech synthesis, STRAIGHT algorithm synthesizer with feature combination is modified. Finally, the objective contrast experiment shows that our new source-target voice conversion process achieves better performance than the conventional methods. In addition, the speaker recognition system is also used to evaluate the quality of converted speech, and experimental result shows that the converted speech has higher target speaker individuality and speech quality.","PeriodicalId":198621,"journal":{"name":"2014 8th Asia Modelling Symposium","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 8th Asia Modelling Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AMS.2014.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Voice conversion system modifies a speaker's voice to be perceived as another speaker uttered, and now it is widely used in many real applications. However, most research only focuses on one aspect performance of voice conversion system, rare theoretical analysis and experimental comparison on the whole source-target speaker voice conversion process has been introduced. Therefore, in this paper, a comprehensive analysis on source-target speaker voice conversion is conducted based on three key steps, including acoustic features selection and extraction, voice conversion model construction, and target speech synthesis, and a complete and optimal source-target speaker voice conversion is proposed. First, a comprehensive feature combination form consisting of prosodic feature, spectrum parameter and spectral envelope characteristic, is proposed. Then, to void the discontinuity and spectrum distortion of a converted speech, DGMM (Dynamic Gaussian Mixture Model) considering dynamic information between frames is presented. Subsequently, for speech synthesis, STRAIGHT algorithm synthesizer with feature combination is modified. Finally, the objective contrast experiment shows that our new source-target voice conversion process achieves better performance than the conventional methods. In addition, the speaker recognition system is also used to evaluate the quality of converted speech, and experimental result shows that the converted speech has higher target speaker individuality and speech quality.
基于DGMM和特征组合的综合语音转换分析
语音转换系统是将一个说话者的声音转换成另一个说话者的声音,目前在许多实际应用中得到了广泛的应用。然而,大多数研究只关注语音转换系统的一个方面的性能,很少对整个源-目标说话人语音转换过程进行理论分析和实验比较。因此,本文围绕声学特征选择与提取、语音转换模型构建、目标语音合成三个关键步骤,对源-目标话音转换进行全面分析,提出完整、优化的源-目标话音转换。首先,提出了由韵律特征、频谱参数和频谱包络特征组成的综合特征组合形式。然后,为了消除转换后语音的不连续和频谱失真,提出了考虑帧间动态信息的动态高斯混合模型(DGMM)。随后,针对语音合成,对带有特征组合的STRAIGHT算法合成器进行了改进。最后,客观对比实验表明,本文提出的源-目标语音转换方法比传统方法具有更好的性能。此外,还利用说话人识别系统对转换后的语音质量进行了评价,实验结果表明,转换后的语音具有较高的目标说话人个性和语音质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信