Neural TTS Voice Conversion

Zvi Kons, Slava Shechtman, A. Sorin, R. Hoory, Carmel Rabinovitz, E. Morais
{"title":"Neural TTS Voice Conversion","authors":"Zvi Kons, Slava Shechtman, A. Sorin, R. Hoory, Carmel Rabinovitz, E. Morais","doi":"10.1109/SLT.2018.8639550","DOIUrl":null,"url":null,"abstract":"Recently, speaker adaptation of neural TTS models received significant interest, and several studies focusing on this topic have been published. All of them explore an adaptation of an initial multi-speaker model trained on a corpus containing from tens to hundreds of individual speaker voices.In this work we focus on a challenging task of TTS voice conversion where an initial system is trained on a single-speaker data and then need to be adapted to a variety of external speaker voices. The TTS voice conversion setup represents a very important use case. Transcribed multi-speaker datasets might be unavailable for many languages while any TTS technology provider is expected to have at least one suitable single-speaker dataset per supported language.We present a neural TTS system comprising separate prosody generator and synthesizer DNN models. The system is trained on a high quality proprietary male speaker dataset. We show that the system models can be converted to a variety of external male and female ordinary voices and an extremely expressive artist’s voice and present crowd-base subjective evaluation results.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"154 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Recently, speaker adaptation of neural TTS models received significant interest, and several studies focusing on this topic have been published. All of them explore an adaptation of an initial multi-speaker model trained on a corpus containing from tens to hundreds of individual speaker voices.In this work we focus on a challenging task of TTS voice conversion where an initial system is trained on a single-speaker data and then need to be adapted to a variety of external speaker voices. The TTS voice conversion setup represents a very important use case. Transcribed multi-speaker datasets might be unavailable for many languages while any TTS technology provider is expected to have at least one suitable single-speaker dataset per supported language.We present a neural TTS system comprising separate prosody generator and synthesizer DNN models. The system is trained on a high quality proprietary male speaker dataset. We show that the system models can be converted to a variety of external male and female ordinary voices and an extremely expressive artist’s voice and present crowd-base subjective evaluation results.
神经TTS语音转换
近年来,神经TTS模型的说话人适应问题引起了人们的极大兴趣,并发表了一些相关研究。他们都在一个包含数十到数百个说话人声音的语料库上训练了一个初始的多说话人模型,并对其进行了改进。在这项工作中,我们专注于TTS语音转换的一个具有挑战性的任务,其中初始系统在单扬声器数据上进行训练,然后需要适应各种外部扬声器的声音。TTS语音转换设置代表了一个非常重要的用例。转录的多语数据集可能无法用于许多语言,而任何TTS技术提供商都希望每种支持的语言至少有一个合适的单语数据集。我们提出了一个神经TTS系统,包括独立的韵律产生器和合成器DNN模型。该系统是在高质量的专有男性演讲者数据集上训练的。我们展示了系统模型可以转换为各种外在的男性和女性的普通声音和极具表现力的艺术家的声音,并呈现出基于人群的主观评价结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信