SSGesture：人类动画合成的多模态手势生成框架。

IF 1.4 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Computer Graphics and Applications Pub Date : 2025-06-06 DOI:10.1109/MCG.2025.3577477

Xinyi Wang, Shiguang Liu, Xu Yang

{"title":"SSGesture：人类动画合成的多模态手势生成框架。","authors":"Xinyi Wang, Shiguang Liu, Xu Yang","doi":"10.1109/MCG.2025.3577477","DOIUrl":null,"url":null,"abstract":"Technological innovations are reshaping the development of animation production. As virtual characters are increasingly used in animation creation and smart assistants, a key challenge is how to automatically generate dialogue gestures. However, current approaches often overlook a wide range of modalities and their interactions, resulting in gestures that have low contextual variation and noticeable jitter. To address these issues, we propose SSGesture, a novel diffusion-based framework that effectively captures cross-modal associations. Our three-layer attention structure enhances multimodal processing. We propose the first method to automatically resolve style conflicts through interpolation-based gesture style control, while implementing a unified unmarked style prompt structure via the PAAN layer. Our framework is practically applied in the field of intelligent virtual assistants to generate gestures in human animation synthesis and to realize various new applications. Extensive experiments and user studies have demonstrated that our proposed framework, provides substantial assistance in enhancing the efficiency of human animation production.","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SSGesture: Multimodal Gesture Generation Framework for Human Animation Synthesis.\",\"authors\":\"Xinyi Wang, Shiguang Liu, Xu Yang\",\"doi\":\"10.1109/MCG.2025.3577477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Technological innovations are reshaping the development of animation production. As virtual characters are increasingly used in animation creation and smart assistants, a key challenge is how to automatically generate dialogue gestures. However, current approaches often overlook a wide range of modalities and their interactions, resulting in gestures that have low contextual variation and noticeable jitter. To address these issues, we propose SSGesture, a novel diffusion-based framework that effectively captures cross-modal associations. Our three-layer attention structure enhances multimodal processing. We propose the first method to automatically resolve style conflicts through interpolation-based gesture style control, while implementing a unified unmarked style prompt structure via the PAAN layer. Our framework is practically applied in the field of intelligent virtual assistants to generate gestures in human animation synthesis and to realize various new applications. Extensive experiments and user studies have demonstrated that our proposed framework, provides substantial assistance in enhancing the efficiency of human animation production.\",\"PeriodicalId\":55026,\"journal\":{\"name\":\"IEEE Computer Graphics and Applications\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Graphics and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/MCG.2025.3577477\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3577477","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

技术创新正在重塑动画制作的发展。随着虚拟角色在动画创作和智能助手中的应用越来越多，如何自动生成对话手势是一个关键的挑战。然而，目前的方法往往忽略了大范围的模态及其相互作用，导致手势具有低上下文变化和明显的抖动。为了解决这些问题，我们提出了SSGesture，这是一个新颖的基于扩散的框架，可以有效地捕获跨模态关联。我们的三层注意结构增强了多模态处理。我们提出了第一种通过基于插值的手势样式控制自动解决样式冲突的方法，同时通过PAAN层实现统一的未标记样式提示结构。我们的框架实际应用于智能虚拟助手领域，在人体动画合成中生成手势，实现各种新的应用。大量的实验和用户研究表明，我们提出的框架，为提高人类动画制作的效率提供了实质性的帮助。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SSGesture: Multimodal Gesture Generation Framework for Human Animation Synthesis.

Technological innovations are reshaping the development of animation production. As virtual characters are increasingly used in animation creation and smart assistants, a key challenge is how to automatically generate dialogue gestures. However, current approaches often overlook a wide range of modalities and their interactions, resulting in gestures that have low contextual variation and noticeable jitter. To address these issues, we propose SSGesture, a novel diffusion-based framework that effectively captures cross-modal associations. Our three-layer attention structure enhances multimodal processing. We propose the first method to automatically resolve style conflicts through interpolation-based gesture style control, while implementing a unified unmarked style prompt structure via the PAAN layer. Our framework is practically applied in the field of intelligent virtual assistants to generate gestures in human animation synthesis and to realize various new applications. Extensive experiments and user studies have demonstrated that our proposed framework, provides substantial assistance in enhancing the efficiency of human animation production.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Computer Graphics and Applications 工程技术-计算机：软件工程

CiteScore

3.20

自引率

5.60%

发文量

160

审稿时长

>12 weeks

期刊介绍： IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.