The iFLYTEK system for blizzard machine learning challenge 2017-ES1

Li-Juan Liu, Chuang Ding, Ya-Jun Hu, Zhenhua Ling, Yuan Jiang, M. Zhou, Si Wei
{"title":"The iFLYTEK system for blizzard machine learning challenge 2017-ES1","authors":"Li-Juan Liu, Chuang Ding, Ya-Jun Hu, Zhenhua Ling, Yuan Jiang, M. Zhou, Si Wei","doi":"10.1109/ASRU.2017.8268999","DOIUrl":null,"url":null,"abstract":"This paper introduces the speech synthesis system submitted by IFLYTEK for the Blizzard Machine Learning Challenge 2017-ES1. Linguistic and acoustic features from a 4hour corpus were released for this task. Participants are expected to build a speech synthesis system on the given linguist and acoustic features without using any external data. Our system is composed of a long short term memory (LSTM) recurrent neural network (RNN)-based acoustic model and a generative adversarial network (GAN)-based post-filter for mel-cepstra. Two approaches to build GAN-based post-filter are implemented and compared in our experiments. The first one is to predict the residuals of mel-cepstra given the mel-cepstra predicted by the LSTM-based acoustic model. However, this method leads to unstable synthetic speech sounds in our experiments, which may be due to the poor quality of analysis-synthesis speech using the natural acoustic features given by this corpus. The other approach is to ignore the detailed components of natural mel-cepstra by dimension reduction using principal component analysis (PCA) and then recover them back using GAN given the main PCA components. At synthesis time, mel-cepstra predicted by the RNN acoustic model are first projected to the main PCA components, which are then sent to the GAN for detail recovering. Finally, the second approach is used in the final submitted system. The evaluation results show the effectiveness of our submitted system.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper introduces the speech synthesis system submitted by IFLYTEK for the Blizzard Machine Learning Challenge 2017-ES1. Linguistic and acoustic features from a 4hour corpus were released for this task. Participants are expected to build a speech synthesis system on the given linguist and acoustic features without using any external data. Our system is composed of a long short term memory (LSTM) recurrent neural network (RNN)-based acoustic model and a generative adversarial network (GAN)-based post-filter for mel-cepstra. Two approaches to build GAN-based post-filter are implemented and compared in our experiments. The first one is to predict the residuals of mel-cepstra given the mel-cepstra predicted by the LSTM-based acoustic model. However, this method leads to unstable synthetic speech sounds in our experiments, which may be due to the poor quality of analysis-synthesis speech using the natural acoustic features given by this corpus. The other approach is to ignore the detailed components of natural mel-cepstra by dimension reduction using principal component analysis (PCA) and then recover them back using GAN given the main PCA components. At synthesis time, mel-cepstra predicted by the RNN acoustic model are first projected to the main PCA components, which are then sent to the GAN for detail recovering. Finally, the second approach is used in the final submitted system. The evaluation results show the effectiveness of our submitted system.
科大讯飞系统参加暴雪机器学习挑战赛2017-ES1
本文介绍了科大讯飞为暴雪机器学习挑战赛2017-ES1提交的语音合成系统。为了完成这项任务,我们发布了一个4小时语料库的语言和声学特征。参与者需要在不使用任何外部数据的情况下,根据给定的语言学家和声学特征构建语音合成系统。该系统由基于长短期记忆(LSTM)循环神经网络(RNN)的声学模型和基于生成对抗网络(GAN)的mel-cepstra后滤波器组成。在实验中,我们实现并比较了两种构建gan后滤波器的方法。首先,根据基于lstm的声学模型预测的倒梅尔谱,预测倒梅尔谱的残差。然而,在我们的实验中,这种方法导致合成语音不稳定,这可能是由于使用该语料库给出的自然声学特征进行分析合成语音的质量较差。另一种方法是使用主成分分析(PCA)通过降维来忽略自然mel-cepstra的详细成分,然后在给定主成分的情况下使用GAN恢复它们。在合成时,首先将RNN声学模型预测的mel-cepstra投影到主PCA分量上,然后将主PCA分量发送给GAN进行细节恢复。最后,在最后提交的系统中使用第二种方法。评价结果表明了所提交系统的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信