Real-time Translation of Upper-body Gestures to Virtual Avatars in Dissimilar Telepresence Environments.

IEEE transactions on visualization and computer graphics Pub Date : 2025-06-05 DOI:10.1109/TVCG.2025.3577156

Jiho Kang, Taehei Kim, Hyeshim Kim, Sung-Hee Lee

{"title":"Real-time Translation of Upper-body Gestures to Virtual Avatars in Dissimilar Telepresence Environments.","authors":"Jiho Kang, Taehei Kim, Hyeshim Kim, Sung-Hee Lee","doi":"10.1109/TVCG.2025.3577156","DOIUrl":null,"url":null,"abstract":"<p><p>In mixed reality (MR) avatar-mediated telepresence, avatar movement must be adjusted to convey the user's intent in a dissimilar space. This paper presents a novel neural network-based framework designed for translating upper-body gestures, which adjusts virtual avatar movements in dissimilar environments to accurately reflect the user's intended gestures in real-time. Our framework translates a wide range of upperbody gestures, including eye gaze, deictic gestures, free-form gestures, and the transitions between them. A key feature of our framework is its ability to generate natural upper-body gestures for users of different sizes, irrespective of handedness and eye dominance, even though the training is based on data from a single person. Unlike previous methods that require paired motion between users and avatars for training, our framework uses an unpaired approach, significantly reducing training time and allowing for generating a wider variety of motion types. These advantages were made possible by designing two separate networks: the Motion Progression Network, which interprets sparse tracking signals from the user to determine motion progression, and the Upper-body Gesture Network, which autoregressively generates the avatar's pose based on these progressions. We demonstrate the effectiveness of our framework through quantitative comparisons with state-of-the-art methods, qualitative animation results, and a user evaluation in MR telepresence scenarios.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3577156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In mixed reality (MR) avatar-mediated telepresence, avatar movement must be adjusted to convey the user's intent in a dissimilar space. This paper presents a novel neural network-based framework designed for translating upper-body gestures, which adjusts virtual avatar movements in dissimilar environments to accurately reflect the user's intended gestures in real-time. Our framework translates a wide range of upperbody gestures, including eye gaze, deictic gestures, free-form gestures, and the transitions between them. A key feature of our framework is its ability to generate natural upper-body gestures for users of different sizes, irrespective of handedness and eye dominance, even though the training is based on data from a single person. Unlike previous methods that require paired motion between users and avatars for training, our framework uses an unpaired approach, significantly reducing training time and allowing for generating a wider variety of motion types. These advantages were made possible by designing two separate networks: the Motion Progression Network, which interprets sparse tracking signals from the user to determine motion progression, and the Upper-body Gesture Network, which autoregressively generates the avatar's pose based on these progressions. We demonstrate the effectiveness of our framework through quantitative comparisons with state-of-the-art methods, qualitative animation results, and a user evaluation in MR telepresence scenarios.

查看原文本刊更多论文

不同网真环境下上半身手势对虚拟化身的实时转换。

在混合现实（MR）虚拟化身媒介远程呈现中，必须调整虚拟化身的运动以在不同的空间中传达用户的意图。本文提出了一种新的基于神经网络的上半身手势翻译框架，该框架可以在不同的环境中调整虚拟化身的动作，以实时准确地反映用户的意图手势。我们的框架翻译了广泛的上半身手势，包括眼睛凝视，指示手势，自由形式的手势，以及它们之间的转换。我们的框架的一个关键特征是它能够为不同体型的用户生成自然的上半身手势，而不考虑他们的惯用手和眼睛的优势，即使训练是基于单个人的数据。与以前需要在用户和化身之间配对运动进行训练的方法不同，我们的框架使用非配对方法，显着减少了训练时间，并允许生成更广泛的运动类型。这些优势是通过设计两个独立的网络实现的：动作进展网络，它解释来自用户的稀疏跟踪信号来确定动作进展，以及上半身手势网络，它根据这些进展自回归地生成化身的姿势。我们通过与最先进的方法、定性动画结果和MR网真场景中的用户评估进行定量比较，证明了我们框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on visualization and computer graphics

自引率

0.00%

发文量