Speech-Driven Animation Constrained by Appropriate Discourse Functions

Proceedings of the 16th International Conference on Multimodal Interaction Pub Date : 2014-11-12 DOI:10.1145/2663204.2663252

Najmeh Sadoughi, Yang Liu, C. Busso

{"title":"Speech-Driven Animation Constrained by Appropriate Discourse Functions","authors":"Najmeh Sadoughi, Yang Liu, C. Busso","doi":"10.1145/2663204.2663252","DOIUrl":null,"url":null,"abstract":"Conversational agents provide powerful opportunities to interact and engage with the users. The challenge is how to create naturalistic behaviors that replicate the complex gestures observed during human interactions. Previous studies have used rule-based frameworks or data-driven models to generate appropriate gestures, which are properly synchronized with the underlying discourse functions. Among these methods, speech-driven approaches are especially appealing given the rich information conveyed on speech. It captures emotional cues and prosodic patterns that are important to synthesize behaviors (i.e., modeling the variability and complexity of the timings of the behaviors). The main limitation of these models is that they fail to capture the underlying semantic and discourse functions of the message (e.g., nodding). This study proposes a speech-driven framework that explicitly model discourse functions, bridging the gap between speech-driven and rule-based models. The approach is based on dynamic Bayesian Network (DBN), where an additional node is introduced to constrain the models by specific discourse functions. We implement the approach by synthesizing head and eyebrow motion. We conduct perceptual evaluations to compare the animations generated using the constrained and unconstrained models.","PeriodicalId":389037,"journal":{"name":"Proceedings of the 16th International Conference on Multimodal Interaction","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2663204.2663252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Conversational agents provide powerful opportunities to interact and engage with the users. The challenge is how to create naturalistic behaviors that replicate the complex gestures observed during human interactions. Previous studies have used rule-based frameworks or data-driven models to generate appropriate gestures, which are properly synchronized with the underlying discourse functions. Among these methods, speech-driven approaches are especially appealing given the rich information conveyed on speech. It captures emotional cues and prosodic patterns that are important to synthesize behaviors (i.e., modeling the variability and complexity of the timings of the behaviors). The main limitation of these models is that they fail to capture the underlying semantic and discourse functions of the message (e.g., nodding). This study proposes a speech-driven framework that explicitly model discourse functions, bridging the gap between speech-driven and rule-based models. The approach is based on dynamic Bayesian Network (DBN), where an additional node is introduced to constrain the models by specific discourse functions. We implement the approach by synthesizing head and eyebrow motion. We conduct perceptual evaluations to compare the animations generated using the constrained and unconstrained models.

查看原文本刊更多论文

适当话语功能约束下的语音驱动动画

会话代理提供了与用户交互和参与的强大机会。挑战在于如何创造自然的行为，复制人类互动中观察到的复杂手势。先前的研究使用基于规则的框架或数据驱动的模型来生成适当的手势，这些手势与底层话语功能适当同步。在这些方法中，语音驱动的方法尤其具有吸引力，因为语音传递了丰富的信息。它捕捉对综合行为很重要的情感线索和韵律模式(即，模拟行为时间的可变性和复杂性)。这些模型的主要限制是它们无法捕捉信息的潜在语义和话语功能(例如，点头)。本研究提出了一个语音驱动的框架，明确地为话语功能建模，弥合了语音驱动模型和基于规则的模型之间的差距。该方法基于动态贝叶斯网络(DBN)，其中引入了一个额外的节点，通过特定的话语函数约束模型。我们通过综合头部和眉毛的运动来实现该方法。我们进行感知评估，以比较使用约束和非约束模型生成的动画。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 16th International Conference on Multimodal Interaction

自引率

0.00%

发文量