TED-culture: culturally inclusive co-speech gesture generation for embodied social agents.

IF 2.9 Q2 ROBOTICS

Frontiers in Robotics and AI Pub Date : 2025-04-08 eCollection Date: 2025-01-01 DOI:10.3389/frobt.2025.1546765

Yixin Shen, Wafa Johal

{"title":"TED-culture: culturally inclusive co-speech gesture generation for embodied social agents.","authors":"Yixin Shen, Wafa Johal","doi":"10.3389/frobt.2025.1546765","DOIUrl":null,"url":null,"abstract":"<p><p>Generating natural and expressive co-speech gestures for conversational virtual agents and social robots is crucial for enhancing their acceptability and usability in real-world contexts. However, this task is complicated by strong cultural and linguistic influences on gesture patterns, exacerbated by the limited availability of cross-cultural co-speech gesture datasets. To address this gap, we introduce the TED-Culture Dataset, a novel dataset derived from TED talks, designed to enable cross-cultural gesture generation based on linguistic cues. We propose a generative model based on the Stable Diffusion architecture, which we evaluate on both the TED-Expressive Dataset and the TED-Culture Dataset. The model is further implemented on the NAO robot to assess real-time performance. Our model surpasses state-of-the-art baselines in gesture naturalness and exhibits rapid convergence across languages, specifically Indonesian, Japanese, and Italian. Objective and subjective evaluations confirm improvements in communicative effectiveness. Notably, results reveal that individuals are more critical of gestures in their native language, expecting higher generative performance in familiar linguistic contexts. By releasing the TED-Culture Dataset, we facilitate future research on multilingual gesture generation for embodied agents. The study underscores the importance of cultural and linguistic adaptation in co-speech gesture synthesis, with implications for human-robot interaction design.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1546765"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12011587/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2025.1546765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Generating natural and expressive co-speech gestures for conversational virtual agents and social robots is crucial for enhancing their acceptability and usability in real-world contexts. However, this task is complicated by strong cultural and linguistic influences on gesture patterns, exacerbated by the limited availability of cross-cultural co-speech gesture datasets. To address this gap, we introduce the TED-Culture Dataset, a novel dataset derived from TED talks, designed to enable cross-cultural gesture generation based on linguistic cues. We propose a generative model based on the Stable Diffusion architecture, which we evaluate on both the TED-Expressive Dataset and the TED-Culture Dataset. The model is further implemented on the NAO robot to assess real-time performance. Our model surpasses state-of-the-art baselines in gesture naturalness and exhibits rapid convergence across languages, specifically Indonesian, Japanese, and Italian. Objective and subjective evaluations confirm improvements in communicative effectiveness. Notably, results reveal that individuals are more critical of gestures in their native language, expecting higher generative performance in familiar linguistic contexts. By releasing the TED-Culture Dataset, we facilitate future research on multilingual gesture generation for embodied agents. The study underscores the importance of cultural and linguistic adaptation in co-speech gesture synthesis, with implications for human-robot interaction design.

查看原文本刊更多论文

ted文化：具身社会行动者的文化包容性共同语言手势生成。

为会话虚拟代理和社交机器人生成自然且富有表现力的协同语音手势对于提高它们在现实环境中的可接受性和可用性至关重要。然而，这一任务由于强烈的文化和语言对手势模式的影响而变得复杂，并且由于跨文化共语手势数据集的有限可用性而加剧。为了解决这一差距，我们引入了TED文化数据集，这是一个来自TED演讲的新数据集，旨在实现基于语言线索的跨文化手势生成。我们提出了一个基于稳定扩散架构的生成模型，并在ted表达数据集和ted文化数据集上对其进行了评估。在NAO机器人上进一步实现了该模型，以评估其实时性。我们的模型在手势自然度方面超越了最先进的基线，并表现出跨语言的快速收敛，特别是印尼语、日语和意大利语。客观和主观评价证实了交际效果的改善。值得注意的是，结果显示，个体对母语中的手势更挑剔，期望在熟悉的语言环境中有更高的生成表现。通过发布TED-Culture数据集，我们促进了对具身代理的多语言手势生成的未来研究。该研究强调了文化和语言适应在共同语音手势合成中的重要性，这对人机交互设计具有重要意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Robotics and AI ROBOTICS-

CiteScore

6.50

自引率

5.90%

发文量

355

审稿时长

14 weeks

期刊介绍： Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.