{"title":"Text-driven Visual Prosody Generation for Embodied Conversational Agents","authors":"Jiali Chen, Yong Liu, Zhimeng Zhang, Changjie Fan, Yu Ding","doi":"10.1145/3308532.3329445","DOIUrl":null,"url":null,"abstract":"In face-to-face conversations, head motions play a crucial role in encoding information, and humans are very skilled at decoding multiple messages from interlocutors' head motions. It is of great importance to endow embodied conversational agents (ECAs) with the capability of conveying communicative intention through head movements. Our work is aimed at automatically synthesizing head motions for an ECA speaking Chinese. We propose to take only transcripts as input to compute head movements, based on a statistical framework. Subjective experiments are conducted to validate the proposed statistical framework. The results show that the generated head animation is able to improve human perception in terms of naturalness and demonstrate that the head animation is synchronized with the input of synthetic speech.","PeriodicalId":112642,"journal":{"name":"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM International Conference on Intelligent Virtual Agents","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308532.3329445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In face-to-face conversations, head motions play a crucial role in encoding information, and humans are very skilled at decoding multiple messages from interlocutors' head motions. It is of great importance to endow embodied conversational agents (ECAs) with the capability of conveying communicative intention through head movements. Our work is aimed at automatically synthesizing head motions for an ECA speaking Chinese. We propose to take only transcripts as input to compute head movements, based on a statistical framework. Subjective experiments are conducted to validate the proposed statistical framework. The results show that the generated head animation is able to improve human perception in terms of naturalness and demonstrate that the head animation is synchronized with the input of synthetic speech.