随便闲聊还是大声说话?在会话角色的语音和动画生成中调整发音努力

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG) Pub Date : 2023-01-05 DOI:10.1109/FG57933.2023.10042520

Joakim Gustafson, Éva Székely, Simon Alexandersson, J. Beskow

{"title":"随便闲聊还是大声说话?在会话角色的语音和动画生成中调整发音努力","authors":"Joakim Gustafson, Éva Székely, Simon Alexandersson, J. Beskow","doi":"10.1109/FG57933.2023.10042520","DOIUrl":null,"url":null,"abstract":"Embodied conversational agents and social robots need to be able to generate spontaneous behavior in order to be believable in social interactions. We present a system that can generate spontaneous speech with supporting lip movements. The conversational TTS voice is trained on a podcast corpus that has been prosodically tagged (f0, speaking rate and energy) and transcribed (including tokens for breathing, fillers and laughter). We introduce a speech animation algorithm where articulatory effort can be adjusted. The speech animation is driven by time-stamped phonemes obtained from the internal alignment attention map of the TTS system, and we use prominence estimates from the synthesised speech waveform to modulate the lip- and jaw movements accordingly.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"40 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters\",\"authors\":\"Joakim Gustafson, Éva Székely, Simon Alexandersson, J. Beskow\",\"doi\":\"10.1109/FG57933.2023.10042520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Embodied conversational agents and social robots need to be able to generate spontaneous behavior in order to be believable in social interactions. We present a system that can generate spontaneous speech with supporting lip movements. The conversational TTS voice is trained on a podcast corpus that has been prosodically tagged (f0, speaking rate and energy) and transcribed (including tokens for breathing, fillers and laughter). We introduce a speech animation algorithm where articulatory effort can be adjusted. The speech animation is driven by time-stamped phonemes obtained from the internal alignment attention map of the TTS system, and we use prominence estimates from the synthesised speech waveform to modulate the lip- and jaw movements accordingly.\",\"PeriodicalId\":318766,\"journal\":{\"name\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"volume\":\"40 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FG57933.2023.10042520\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FG57933.2023.10042520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

具体的对话代理和社交机器人需要能够产生自发的行为，以便在社交互动中可信。我们提出了一个系统，可以产生自发的语言支持唇运动。会话式TTS语音是在播客语料库上进行训练的，该语料库已经进行了韵律标记(f0、说话速度和能量)和转录(包括呼吸、填充和笑声的标记)。我们介绍了一种语音动画算法，其中发音力度可以调整。语音动画由从TTS系统的内部对齐注意图中获得的时间戳音素驱动，我们使用合成语音波形的突出估计来相应地调节嘴唇和下巴的运动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Casual chatter or speaking up? Adjusting articulatory effort in generation of speech and animation for conversational characters

Embodied conversational agents and social robots need to be able to generate spontaneous behavior in order to be believable in social interactions. We present a system that can generate spontaneous speech with supporting lip movements. The conversational TTS voice is trained on a podcast corpus that has been prosodically tagged (f0, speaking rate and energy) and transcribed (including tokens for breathing, fillers and laughter). We introduce a speech animation algorithm where articulatory effort can be adjusted. The speech animation is driven by time-stamped phonemes obtained from the internal alignment attention map of the TTS system, and we use prominence estimates from the synthesised speech waveform to modulate the lip- and jaw movements accordingly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)

自引率

0.00%

发文量