2020暴雪挑战赛的腾讯语音合成系统

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 Pub Date : 2020-10-30 DOI:10.21437/vcc_bc.2020-4

Qiao Tian, Zewang Zhang, Linghui Chen, Heng Lu, Chengzhu Yu, Chao Weng, Dong Yu

{"title":"2020暴雪挑战赛的腾讯语音合成系统","authors":"Qiao Tian, Zewang Zhang, Linghui Chen, Heng Lu, Chengzhu Yu, Chao Weng, Dong Yu","doi":"10.21437/vcc_bc.2020-4","DOIUrl":null,"url":null,"abstract":"This paper presents the Tencent speech synthesis system for Blizzard Challenge 2020. The corpus released to the partici-pants this year included a TV’s news broadcasting corpus with a length around 8 hours by a Chinese male host (2020-MH1 task), and a Shanghaiese speech corpus with a length around 6 hours (2020-SS1 task). We built a DurIAN-based speech synthesis system for 2020-MH1 task and Tacotron-based system for 2020-SS1 task. For 2020-MH1 task, ﬁrstly, a multi-speaker DurIAN-based acoustic model was trained based on linguistic feature to predict mel spectrograms. Then the model was ﬁne-tuned on only the corpus provided. For 2020-SS1 task, instead of training based on hard-aligned phone boundaries, a Tacotron-like end-to-end system is applied to learn the mappings between phonemes and mel spectrograms. Finally, a modiﬁed version of WaveRNN model conditioning on the predicted mel spectrograms is trained to generate speech waveform. Our team is identiﬁed as L and the evaluation results shows our systems perform very well in various tests. Especially, we took the ﬁrst place in the overall speech intelligibility test.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"The Tencent speech synthesis system for Blizzard Challenge 2020\",\"authors\":\"Qiao Tian, Zewang Zhang, Linghui Chen, Heng Lu, Chengzhu Yu, Chao Weng, Dong Yu\",\"doi\":\"10.21437/vcc_bc.2020-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the Tencent speech synthesis system for Blizzard Challenge 2020. The corpus released to the partici-pants this year included a TV’s news broadcasting corpus with a length around 8 hours by a Chinese male host (2020-MH1 task), and a Shanghaiese speech corpus with a length around 6 hours (2020-SS1 task). We built a DurIAN-based speech synthesis system for 2020-MH1 task and Tacotron-based system for 2020-SS1 task. For 2020-MH1 task, ﬁrstly, a multi-speaker DurIAN-based acoustic model was trained based on linguistic feature to predict mel spectrograms. Then the model was ﬁne-tuned on only the corpus provided. For 2020-SS1 task, instead of training based on hard-aligned phone boundaries, a Tacotron-like end-to-end system is applied to learn the mappings between phonemes and mel spectrograms. Finally, a modiﬁed version of WaveRNN model conditioning on the predicted mel spectrograms is trained to generate speech waveform. Our team is identiﬁed as L and the evaluation results shows our systems perform very well in various tests. Especially, we took the ﬁrst place in the overall speech intelligibility test.\",\"PeriodicalId\":355114,\"journal\":{\"name\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/vcc_bc.2020-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/vcc_bc.2020-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文介绍了腾讯暴雪挑战赛2020语音合成系统。今年向参与者发布的语料库包括一个长度约为8小时的中国男主持人的电视新闻广播语料库(2020-MH1任务)和一个长度约为6小时的上海话语料库(2020-SS1任务)。针对2020-MH1任务构建了基于durian的语音合成系统，针对2020-SS1任务构建了基于tacotron的语音合成系统。对于2020-MH1任务，首先基于语言特征训练基于durian的多说话人声学模型来预测mel谱图;然后仅根据提供的语料库对模型进行微调。对于2020-SS1任务，采用类似tacotron的端到端系统来学习音素和mel谱图之间的映射，而不是基于硬对齐的电话边界进行训练。最后，在预测的mel谱图上训练一个改进的WaveRNN模型来生成语音波形。我们的团队被确定为L，评估结果表明我们的系统在各种测试中表现非常好。特别是，我们在整体语音清晰度测试中获得了第一名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Tencent speech synthesis system for Blizzard Challenge 2020

This paper presents the Tencent speech synthesis system for Blizzard Challenge 2020. The corpus released to the partici-pants this year included a TV’s news broadcasting corpus with a length around 8 hours by a Chinese male host (2020-MH1 task), and a Shanghaiese speech corpus with a length around 6 hours (2020-SS1 task). We built a DurIAN-based speech synthesis system for 2020-MH1 task and Tacotron-based system for 2020-SS1 task. For 2020-MH1 task, ﬁrstly, a multi-speaker DurIAN-based acoustic model was trained based on linguistic feature to predict mel spectrograms. Then the model was ﬁne-tuned on only the corpus provided. For 2020-SS1 task, instead of training based on hard-aligned phone boundaries, a Tacotron-like end-to-end system is applied to learn the mappings between phonemes and mel spectrograms. Finally, a modiﬁed version of WaveRNN model conditioning on the predicted mel spectrograms is trained to generate speech waveform. Our team is identiﬁed as L and the evaluation results shows our systems perform very well in various tests. Especially, we took the ﬁrst place in the overall speech intelligibility test.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

自引率

0.00%

发文量