Factor exploration of gestural stroke choice in the context of ambiguous instruction utterances: challenges to synthesizing semantic gesture from speech alone

2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN) Pub Date : 2021-08-08 DOI:10.1109/RO-MAN50785.2021.9515416

N. DePalma, J. Hodgins

{"title":"Factor exploration of gestural stroke choice in the context of ambiguous instruction utterances: challenges to synthesizing semantic gesture from speech alone","authors":"N. DePalma, J. Hodgins","doi":"10.1109/RO-MAN50785.2021.9515416","DOIUrl":null,"url":null,"abstract":"Current models of gesture synthesis focus primarily on a speech signal to synthesize gestures. In this paper, we take a critical look at this approach from the point of view of gesture’s tendency to disambiguate the verbal component of the expression. We identify and contribute an analysis of three challenge factors for these models: 1) synthesizing gesture in the presence of ambiguous utterances seems to be a overwhelmingly useful case for gesture production yet is not at present supported by present day models of gesture generation, 2) finding the best f-formation to convey spatial gestural information like gesturing directions makes a significant difference for everyday users and must be taken into account, and 3) assuming that captured human motion is a plentiful and easy source for retargeting gestural motion may not yet take into account the readability of gestures under kinematically constrained feasibility spaces.Recent approaches to generate gesture for agents[1] and robots [2] treat gesture as co-speech that is strictly dependent on verbal utterances. Evidence suggests that gesture selection may leverage task context so it is not dependent on verbal utterance only. This effect is particularly evident when attempting to generate gestures from ambiguous verbal utterances (e.g. \"You do this when you get to the fork in the road\"). Decoupling this strict dependency may allow gesture to be synthesized for the purpose of clarification of the ambiguous verbal utterance.","PeriodicalId":6854,"journal":{"name":"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)","volume":"9 1","pages":"102-109"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RO-MAN50785.2021.9515416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Current models of gesture synthesis focus primarily on a speech signal to synthesize gestures. In this paper, we take a critical look at this approach from the point of view of gesture’s tendency to disambiguate the verbal component of the expression. We identify and contribute an analysis of three challenge factors for these models: 1) synthesizing gesture in the presence of ambiguous utterances seems to be a overwhelmingly useful case for gesture production yet is not at present supported by present day models of gesture generation, 2) finding the best f-formation to convey spatial gestural information like gesturing directions makes a significant difference for everyday users and must be taken into account, and 3) assuming that captured human motion is a plentiful and easy source for retargeting gestural motion may not yet take into account the readability of gestures under kinematically constrained feasibility spaces.Recent approaches to generate gesture for agents[1] and robots [2] treat gesture as co-speech that is strictly dependent on verbal utterances. Evidence suggests that gesture selection may leverage task context so it is not dependent on verbal utterance only. This effect is particularly evident when attempting to generate gestures from ambiguous verbal utterances (e.g. "You do this when you get to the fork in the road"). Decoupling this strict dependency may allow gesture to be synthesized for the purpose of clarification of the ambiguous verbal utterance.

查看原文本刊更多论文

歧义指示话语背景下手势笔画选择的因素探索:仅从言语合成语义手势的挑战

目前的手势合成模型主要关注语音信号来合成手势。在本文中，我们从手势消除表达的言语成分歧义的倾向的角度对这种方法进行了批判性的审视。我们确定并分析了这些模型的三个挑战因素:1)在存在歧义话语的情况下合成手势似乎是一个非常有用的手势生成案例，但目前还没有得到当前手势生成模型的支持;2)找到最佳的f-formation来传达空间手势信息，如手势方向，对日常用户来说有很大的不同，必须考虑到这一点;3)假设捕捉到的人体运动是一个丰富而容易的重定向手势运动的来源，可能还没有考虑到在运动学约束的可行性空间下手势的可读性。最近为智能体[1]和机器人[2]生成手势的方法将手势视为严格依赖于口头话语的协同语音。有证据表明，手势选择可能会影响任务上下文，因此它不仅仅依赖于口头表达。当试图从模棱两可的口头话语中生成手势时，这种效果尤其明显。“当你走到岔路口时，你就这样做”)。解耦这种严格的依赖关系可能允许手势被合成，以澄清模棱两可的口头表达。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)

自引率

0.00%

发文量