The USTC system for blizzard machine learning challenge 2017-ES2

Ya-Jun Hu, Li-Juan Liu, Chuang Ding, Zhenhua Ling, Lirong Dai
{"title":"The USTC system for blizzard machine learning challenge 2017-ES2","authors":"Ya-Jun Hu, Li-Juan Liu, Chuang Ding, Zhenhua Ling, Lirong Dai","doi":"10.1109/ASRU.2017.8268998","DOIUrl":null,"url":null,"abstract":"The Blizzard Machine Learning Challenge (BMLC) aims to liberate participants from speech-specific processing when building speech synthesis systems. This paper describes the USTC system for the ES2 sub-task in BMLC2017, which requires participants to train a model to directly predict waveforms from linguistic features. We investigate three aspects of waveform modeling when preparing our system for this task. First, two different model structures for waveform modeling, i.e., WaveNet and SampleRNN, are compared on this task. Second, a strategy of using features extracted from waveforms as intermediate representations for waveform modeling is studied. Experimental results show that using low-level features (STFT amplitude spectra) as intermediate representations can achieve similar performance as using high-level features (mel-cepstra and F0). Third, the feasibility of applying WaveNet to wideband speech signals with more than 256 quantization levels is verified by experiments. Finally, a system which adopts STFT amplitude spectra as intermediate representations to model 24kHz speech waveforms with 1024 mu-law quantization levels is submitted for evaluation. The evaluation results of BMLC2017 demonstrate the effectiveness of our proposed methods.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268998","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The Blizzard Machine Learning Challenge (BMLC) aims to liberate participants from speech-specific processing when building speech synthesis systems. This paper describes the USTC system for the ES2 sub-task in BMLC2017, which requires participants to train a model to directly predict waveforms from linguistic features. We investigate three aspects of waveform modeling when preparing our system for this task. First, two different model structures for waveform modeling, i.e., WaveNet and SampleRNN, are compared on this task. Second, a strategy of using features extracted from waveforms as intermediate representations for waveform modeling is studied. Experimental results show that using low-level features (STFT amplitude spectra) as intermediate representations can achieve similar performance as using high-level features (mel-cepstra and F0). Third, the feasibility of applying WaveNet to wideband speech signals with more than 256 quantization levels is verified by experiments. Finally, a system which adopts STFT amplitude spectra as intermediate representations to model 24kHz speech waveforms with 1024 mu-law quantization levels is submitted for evaluation. The evaluation results of BMLC2017 demonstrate the effectiveness of our proposed methods.
暴雪机器学习挑战赛2017-ES2的USTC系统
暴雪机器学习挑战赛(BMLC)旨在将参与者从构建语音合成系统时的特定语音处理中解放出来。本文描述了BMLC2017中ES2子任务的USTC系统,该系统要求参与者训练一个模型来直接从语言特征中预测波形。在为这项任务准备我们的系统时,我们研究了波形建模的三个方面。首先,比较了波形建模的两种不同模型结构,即WaveNet和SampleRNN。其次,研究了一种利用波形特征提取作为波形建模的中间表示的策略。实验结果表明,使用低阶特征(STFT振幅谱)作为中间表征可以达到与使用高阶特征(mel-cepstra和F0)相似的性能。第三,通过实验验证了将WaveNet应用于256级以上量化的宽带语音信号的可行性。最后,提出了一种采用STFT振幅谱作为中间表示的系统,用于模拟1024 μ律量化水平的24kHz语音波形。BMLC2017的评估结果证明了我们提出的方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信