一个开源的神经网络语音合成系统

Speech Synthesis Workshop Pub Date : 2016-09-15 DOI:10.21437/SSW.2016-33

Zhizheng Wu, O. Watts, Simon King

{"title":"一个开源的神经网络语音合成系统","authors":"Zhizheng Wu, O. Watts, Simon King","doi":"10.21437/SSW.2016-33","DOIUrl":null,"url":null,"abstract":"We introduce the Merlin speech synthesis toolkit for neural network-based speech synthesis. The system takes linguistic features as input, and employs neural networks to predict acoustic features, which are then passed to a vocoder to produce the speech waveform. Various neural network architectures are implemented, including a standard feedforward neural network, mixture density neural network, recurrent neural network (RNN), long short-term memory (LSTM) recurrent neural network, amongst others. The toolkit is Open Source, written in Python, and is extensible. This paper brieﬂy describes the system, and provides some benchmarking results on a freely-available corpus.","PeriodicalId":340820,"journal":{"name":"Speech Synthesis Workshop","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"320","resultStr":"{\"title\":\"Merlin: An Open Source Neural Network Speech Synthesis System\",\"authors\":\"Zhizheng Wu, O. Watts, Simon King\",\"doi\":\"10.21437/SSW.2016-33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce the Merlin speech synthesis toolkit for neural network-based speech synthesis. The system takes linguistic features as input, and employs neural networks to predict acoustic features, which are then passed to a vocoder to produce the speech waveform. Various neural network architectures are implemented, including a standard feedforward neural network, mixture density neural network, recurrent neural network (RNN), long short-term memory (LSTM) recurrent neural network, amongst others. The toolkit is Open Source, written in Python, and is extensible. This paper brieﬂy describes the system, and provides some benchmarking results on a freely-available corpus.\",\"PeriodicalId\":340820,\"journal\":{\"name\":\"Speech Synthesis Workshop\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"320\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Synthesis Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/SSW.2016-33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Synthesis Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SSW.2016-33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 320

摘要

介绍了用于神经网络语音合成的Merlin语音合成工具箱。该系统以语言特征作为输入，并采用神经网络来预测声学特征，然后将其传递给声码器以产生语音波形。实现了各种神经网络架构，包括标准前馈神经网络、混合密度神经网络、循环神经网络(RNN)、长短期记忆(LSTM)循环神经网络等。该工具包是开源的，用Python编写，并且是可扩展的。本文简要介绍了该系统，并在一个免费语料库上提供了一些基准测试结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Merlin: An Open Source Neural Network Speech Synthesis System

We introduce the Merlin speech synthesis toolkit for neural network-based speech synthesis. The system takes linguistic features as input, and employs neural networks to predict acoustic features, which are then passed to a vocoder to produce the speech waveform. Various neural network architectures are implemented, including a standard feedforward neural network, mixture density neural network, recurrent neural network (RNN), long short-term memory (LSTM) recurrent neural network, amongst others. The toolkit is Open Source, written in Python, and is extensible. This paper brieﬂy describes the system, and provides some benchmarking results on a freely-available corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Speech Synthesis Workshop

自引率

0.00%

发文量