2020暴雪挑战赛的OPPO系统

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 Pub Date : 2020-10-30 DOI:10.21437/vcc_bc.2020-3

Yang Song, Min-Siong Liang, Guilin Yang, Kun Xie, Jie Hao

{"title":"2020暴雪挑战赛的OPPO系统","authors":"Yang Song, Min-Siong Liang, Guilin Yang, Kun Xie, Jie Hao","doi":"10.21437/vcc_bc.2020-3","DOIUrl":null,"url":null,"abstract":"This paper presents the OPPO text-to-speech system for Blizzard Challenge 2020. A statistical parametric speech synthesis based system was built with improvements in both frontend and backend. For the Mandarin task, a BERT model was used for the frontend, a Tacotron acoustic model and a WaveRNN vocoder model were used for the backend. For the Shanghainese task, the frontend was built from scratch, a Tacotron acoustic model and a MelGAN vocoder model were used for the backend. For the Mandarin task, evaluation results showed that our proposed system performed best in naturalness, and achieved near-best results in similarity. For the Shanghainese task, we got poor results in most indicators.","PeriodicalId":355114,"journal":{"name":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","volume":"247 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The OPPO System for the Blizzard Challenge 2020\",\"authors\":\"Yang Song, Min-Siong Liang, Guilin Yang, Kun Xie, Jie Hao\",\"doi\":\"10.21437/vcc_bc.2020-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the OPPO text-to-speech system for Blizzard Challenge 2020. A statistical parametric speech synthesis based system was built with improvements in both frontend and backend. For the Mandarin task, a BERT model was used for the frontend, a Tacotron acoustic model and a WaveRNN vocoder model were used for the backend. For the Shanghainese task, the frontend was built from scratch, a Tacotron acoustic model and a MelGAN vocoder model were used for the backend. For the Mandarin task, evaluation results showed that our proposed system performed best in naturalness, and achieved near-best results in similarity. For the Shanghainese task, we got poor results in most indicators.\",\"PeriodicalId\":355114,\"journal\":{\"name\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"volume\":\"247 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/vcc_bc.2020-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/vcc_bc.2020-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了暴雪挑战赛2020的OPPO文本转语音系统。基于统计参数的语音合成系统在前端和后端进行了改进。对于普通话任务，前端使用BERT模型，后端使用Tacotron声学模型和WaveRNN声码器模型。对于上海话任务，前端是从头开始构建的，后端使用了Tacotron声学模型和MelGAN声码器模型。对于普通话任务，评估结果表明，我们提出的系统在自然度方面表现最好，在相似度方面取得了接近最佳的结果。对于上海人任务，我们在大多数指标上都取得了较差的成绩。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The OPPO System for the Blizzard Challenge 2020

This paper presents the OPPO text-to-speech system for Blizzard Challenge 2020. A statistical parametric speech synthesis based system was built with improvements in both frontend and backend. For the Mandarin task, a BERT model was used for the frontend, a Tacotron acoustic model and a WaveRNN vocoder model were used for the backend. For the Shanghainese task, the frontend was built from scratch, a Tacotron acoustic model and a MelGAN vocoder model were used for the backend. For the Mandarin task, evaluation results showed that our proposed system performed best in naturalness, and achieved near-best results in similarity. For the Shanghainese task, we got poor results in most indicators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

自引率

0.00%

发文量