基于自回归转换模型和时值调整的非并行语音转换

Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai
{"title":"基于自回归转换模型和时值调整的非并行语音转换","authors":"Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai","doi":"10.21437/vcc_bc.2020-17","DOIUrl":null,"url":null,"abstract":"Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insuffi-ciency. In this paper, we propose to overcome these limita-tions by introducing three modifications. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-fidelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance significantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released official subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment\",\"authors\":\"Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai\",\"doi\":\"10.21437/vcc_bc.2020-17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insuffi-ciency. In this paper, we propose to overcome these limita-tions by introducing three modifications. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-fidelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance significantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released official subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.\",\"PeriodicalId\":355114,\"journal\":{\"name\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/vcc_bc.2020-17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/vcc_bc.2020-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

尽管N10系统在2018年语音转换挑战赛(VCC 18)中在语音自然度和说话人相似度方面都取得了出色的语音转换效果,但由于一些建模不足,系统的性能受到限制。在本文中,我们建议通过引入三个修改来克服这些限制。首先,为了提高模型转换能力,我们用自回归模型代替模型;其次,采用高保真WaveNet对24kHz/16bit波形进行建模,提高转换语音的自然度;第三,提出了一种时长调整策略来补偿源语和目标语明显的语速差异。实验结果表明,该方法能显著提高转换性能。此外,我们通过将该系统直接应用于语音转换挑战2020 (VCC 2020)中的跨语言任务,验证了该系统在跨语言语音转换方面的性能。官方发布的主观测试结果表明,我们的系统在转换语音的自然度方面取得了最好的性能,并且在说话人相似度方面取得了与最佳系统相当的性能,这表明我们的方法也可以实现最先进的跨语言语音转换性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment
Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insuffi-ciency. In this paper, we propose to overcome these limita-tions by introducing three modifications. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-fidelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance significantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released official subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信