Text-driven Visual Prosody Generation for Embodied Conversational Agents

Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents Pub Date : 2019-07-01 DOI:10.1145/3308532.3329445

Jiali Chen, Yong Liu, Zhimeng Zhang, Changjie Fan, Yu Ding

引用次数: 3

Abstract

In face-to-face conversations, head motions play a crucial role in encoding information, and humans are very skilled at decoding multiple messages from interlocutors' head motions. It is of great importance to endow embodied conversational agents (ECAs) with the capability of conveying communicative intention through head movements. Our work is aimed at automatically synthesizing head motions for an ECA speaking Chinese. We propose to take only transcripts as input to compute head movements, based on a statistical framework. Subjective experiments are conducted to validate the proposed statistical framework. The results show that the generated head animation is able to improve human perception in terms of naturalness and demonstrate that the head animation is synchronized with the input of synthetic speech.

查看原文本刊更多论文

文本驱动的视觉韵律生成

在面对面的对话中，头部动作在信息编码中起着至关重要的作用，人类非常擅长从对话者的头部动作中解码多种信息。赋予具身会话代理(eca)通过头部动作传达交际意图的能力是非常重要的。我们的工作旨在自动合成说中文的ECA的头部动作。我们建议基于统计框架，仅将文本作为输入来计算头部运动。进行了主观实验来验证所提出的统计框架。结果表明，生成的头部动画能够提高人类感知的自然度，并证明头部动画与合成语音的输入是同步的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents

自引率

0.00%

发文量