基于引文注释文本预测基于vqae的角色表演风格用于有声读物语音合成

Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, H. Saruwatari
{"title":"基于引文注释文本预测基于vqae的角色表演风格用于有声读物语音合成","authors":"Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, H. Saruwatari","doi":"10.21437/interspeech.2022-638","DOIUrl":null,"url":null,"abstract":"We propose a speech-synthesis model for predicting appropriate voice styles on the basis of the character-annotated text for audiobook speech synthesis. An audiobook is more engaging when the narrator makes distinctive voices depending on the story characters. Our goal is to produce such distinctive voices in the speech-synthesis framework. However, such distinction has not been extensively investigated in audiobook speech synthesis. To enable the speech-synthesis model to achieve distinctive voices depending on characters with minimum extra anno-tation, we propose a speech synthesis model to predict character appropriate voices from quotation-annotated text. Our proposed model involves character-acting-style extraction based on a vector quantized variational autoencoder, and style prediction from quotation-annotated texts which enables us to automate audiobook creation with character-distinctive voices from quotation-annotated texts. To the best of our knowledge, this is the first attempt to model intra-speaker voice style depending on character acting for audiobook speech synthesis. We conducted subjective evaluations of our model, and the results indicate that the proposed model generated more distinctive character voices compared to models that do not use the explicit character-acting-style while maintaining the naturalness of synthetic speech.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"4551-4555"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis\",\"authors\":\"Wataru Nakata, Tomoki Koriyama, Shinnosuke Takamichi, Yuki Saito, Yusuke Ijima, Ryo Masumura, H. Saruwatari\",\"doi\":\"10.21437/interspeech.2022-638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a speech-synthesis model for predicting appropriate voice styles on the basis of the character-annotated text for audiobook speech synthesis. An audiobook is more engaging when the narrator makes distinctive voices depending on the story characters. Our goal is to produce such distinctive voices in the speech-synthesis framework. However, such distinction has not been extensively investigated in audiobook speech synthesis. To enable the speech-synthesis model to achieve distinctive voices depending on characters with minimum extra anno-tation, we propose a speech synthesis model to predict character appropriate voices from quotation-annotated text. Our proposed model involves character-acting-style extraction based on a vector quantized variational autoencoder, and style prediction from quotation-annotated texts which enables us to automate audiobook creation with character-distinctive voices from quotation-annotated texts. To the best of our knowledge, this is the first attempt to model intra-speaker voice style depending on character acting for audiobook speech synthesis. We conducted subjective evaluations of our model, and the results indicate that the proposed model generated more distinctive character voices compared to models that do not use the explicit character-acting-style while maintaining the naturalness of synthetic speech.\",\"PeriodicalId\":73500,\"journal\":{\"name\":\"Interspeech\",\"volume\":\"1 1\",\"pages\":\"4551-4555\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interspeech\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/interspeech.2022-638\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

我们提出了一种语音合成模型,用于在有声读物语音合成的字符注释文本的基础上预测合适的语音风格。当叙述者根据故事人物发出独特的声音时,有声读物会更有吸引力。我们的目标是在语音合成框架中产生这样独特的声音。然而,这种区别并没有在有声读物语音合成中得到广泛的研究。为了使语音合成模型能够以最小的额外注释实现依赖于字符的独特语音,我们提出了一种语音合成模型来从引用注释文本中预测适合字符的语音。我们提出的模型包括基于矢量量化变分自动编码器的角色表演风格提取,以及从引文注释文本中进行风格预测,这使我们能够使用引文注释文本的角色独特声音自动创建有声读物。据我们所知,这是第一次尝试根据有声读物语音合成中的角色表演来模拟说话者内部的声音风格。我们对我们的模型进行了主观评估,结果表明,与不使用明确的角色表演风格同时保持合成语音自然度的模型相比,所提出的模型生成了更具特色的角色声音。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting VQVAE-based Character Acting Style from Quotation-Annotated Text for Audiobook Speech Synthesis
We propose a speech-synthesis model for predicting appropriate voice styles on the basis of the character-annotated text for audiobook speech synthesis. An audiobook is more engaging when the narrator makes distinctive voices depending on the story characters. Our goal is to produce such distinctive voices in the speech-synthesis framework. However, such distinction has not been extensively investigated in audiobook speech synthesis. To enable the speech-synthesis model to achieve distinctive voices depending on characters with minimum extra anno-tation, we propose a speech synthesis model to predict character appropriate voices from quotation-annotated text. Our proposed model involves character-acting-style extraction based on a vector quantized variational autoencoder, and style prediction from quotation-annotated texts which enables us to automate audiobook creation with character-distinctive voices from quotation-annotated texts. To the best of our knowledge, this is the first attempt to model intra-speaker voice style depending on character acting for audiobook speech synthesis. We conducted subjective evaluations of our model, and the results indicate that the proposed model generated more distinctive character voices compared to models that do not use the explicit character-acting-style while maintaining the naturalness of synthetic speech.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信