基于自回归转换模型和时值调整的非并行语音转换

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 Pub Date : 2020-10-30 DOI:10.21437/vcc_bc.2020-17

Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai

{"title":"基于自回归转换模型和时值调整的非并行语音转换","authors":"Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai","doi":"10.21437/vcc_bc.2020-17","DOIUrl":null,"url":null,"abstract":"Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insufﬁ-ciency. In this paper, we propose to overcome these limita-tions by introducing three modiﬁcations. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-ﬁdelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance signiﬁcantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released ofﬁcial subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment\",\"authors\":\"Li-Juan Liu, Yan-Nian Chen, Jing-Xuan Zhang, Yuan Jiang, Ya-Jun Hu, Zhenhua Ling, Lirong Dai\",\"doi\":\"10.21437/vcc_bc.2020-17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insufﬁ-ciency. In this paper, we propose to overcome these limita-tions by introducing three modiﬁcations. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-ﬁdelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance signiﬁcantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released ofﬁcial subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.\",\"PeriodicalId\":355114,\"journal\":{\"name\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/vcc_bc.2020-17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/vcc_bc.2020-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

尽管N10系统在2018年语音转换挑战赛(VCC 18)中在语音自然度和说话人相似度方面都取得了出色的语音转换效果，但由于一些建模不足，系统的性能受到限制。在本文中，我们建议通过引入三个修改来克服这些限制。首先，为了提高模型转换能力，我们用自回归模型代替模型;其次，采用高保真WaveNet对24kHz/16bit波形进行建模，提高转换语音的自然度;第三，提出了一种时长调整策略来补偿源语和目标语明显的语速差异。实验结果表明，该方法能显著提高转换性能。此外，我们通过将该系统直接应用于语音转换挑战2020 (VCC 2020)中的跨语言任务，验证了该系统在跨语言语音转换方面的性能。官方发布的主观测试结果表明，我们的系统在转换语音的自然度方面取得了最好的性能，并且在说话人相似度方面取得了与最佳系统相当的性能，这表明我们的方法也可以实现最先进的跨语言语音转换性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Non-Parallel Voice Conversion with Autoregressive Conversion Model and Duration Adjustment

Although N10 system in Voice Conversion Challenge 2018 (VCC 18) has achieved excellent voice conversion results in both speech naturalness and speaker similarity, the sys-tem’s performance is limited due to some modeling insufﬁ-ciency. In this paper, we propose to overcome these limita-tions by introducing three modiﬁcations. First, we substitute an autoregressive-based model in order to improve the conversion model capability; second, we use high-ﬁdelity WaveNet to model 24kHz/16bit waveform in order to improve conversion speech naturalness; third, a duration adjustment strategy is proposed to compensate the obvious speech rate difference between source and target speakers. Experimental results show that our proposed method can improve the conversion performance signiﬁcantly. Furthermore, we validate the performance of this system for cross-lingual voice conversion by applying it directly to the cross-lingual task in Voice Conversion Challenge 2020 (VCC 2020). The released ofﬁcial subjective results show that our system obtains the best performance in conversion speech naturalness and comparable performance to the best system in speaker similarity, which indicate that our proposed method can achieve state-of-the-art cross-lingual voice conversion performance as well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

自引率

0.00%

发文量