A light-weight method of building an LSTM-RNN-based bilingual tts system

Huaiping Ming, Yanfeng Lu, Zhengchen Zhang, M. Dong
{"title":"A light-weight method of building an LSTM-RNN-based bilingual tts system","authors":"Huaiping Ming, Yanfeng Lu, Zhengchen Zhang, M. Dong","doi":"10.1109/IALP.2017.8300579","DOIUrl":null,"url":null,"abstract":"For a long time, text-to-speech (TTS) synthesis systems could only handle one language. Early bilingual TTS systems were constructed by directly combining two monolingual systems, with language switching. The bilingual speech generated by such systems normally contained two different voices, therefore causing unnatural, sometimes disturbing effects. A genuine bilingual TTS system should use a single voice and avoid switching between two independent monolingual systems. Accordingly, the difficulties of building genuine bilingual speech synthesizers lie in merging two different languages into the same system and preparing bilingual speech data with the same speaker. Various methods have been proposed to overcome these difficulties, including soft prosody prediction, phone, state and frame mapping, and most recently speaker and language factorization. Professional speakers who can speak two languages fluently are hard to find. In many cases a speaker can speak one language well, but the second only fairly. In this paper we propose an easy linguistic feature concatenation method to build a bilingual TTS system with data created by such a speaker, using an LSTM-RNN-based speech synthesizer. Both objective and subjective evaluations show the effectiveness of this method.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2017.8300579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

For a long time, text-to-speech (TTS) synthesis systems could only handle one language. Early bilingual TTS systems were constructed by directly combining two monolingual systems, with language switching. The bilingual speech generated by such systems normally contained two different voices, therefore causing unnatural, sometimes disturbing effects. A genuine bilingual TTS system should use a single voice and avoid switching between two independent monolingual systems. Accordingly, the difficulties of building genuine bilingual speech synthesizers lie in merging two different languages into the same system and preparing bilingual speech data with the same speaker. Various methods have been proposed to overcome these difficulties, including soft prosody prediction, phone, state and frame mapping, and most recently speaker and language factorization. Professional speakers who can speak two languages fluently are hard to find. In many cases a speaker can speak one language well, but the second only fairly. In this paper we propose an easy linguistic feature concatenation method to build a bilingual TTS system with data created by such a speaker, using an LSTM-RNN-based speech synthesizer. Both objective and subjective evaluations show the effectiveness of this method.
基于lstm - rnn的双语tts系统轻量级构建方法
长期以来,文本到语音(TTS)合成系统只能处理一种语言。早期的双语TTS系统是由两个单语系统直接组合而成,并进行语言切换。这种系统生成的双语语音通常包含两种不同的声音,因此会产生不自然的、有时令人不安的效果。一个真正的双语TTS系统应该使用单一语音,避免在两个独立的单语系统之间切换。因此,构建真正的双语语音合成器的难点在于将两种不同的语言合并到同一个系统中,并使用同一说话者准备双语语音数据。已经提出了各种方法来克服这些困难,包括软韵律预测,电话,状态和帧映射,以及最近的说话人和语言分解。能流利地说两种语言的专业人士很难找到。在许多情况下,一个人可以把一种语言说得很好,但另一种语言只能说得一般。在本文中,我们提出了一种简单的语言特征拼接方法,利用基于lstm - rnn的语音合成器,利用演讲者创建的数据构建双语TTS系统。客观评价和主观评价均表明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信