{"title":"SSGesture:人类动画合成的多模态手势生成框架。","authors":"Xinyi Wang, Shiguang Liu, Xu Yang","doi":"10.1109/MCG.2025.3577477","DOIUrl":null,"url":null,"abstract":"<p><p>Technological innovations are reshaping the development of animation production. As virtual characters are increasingly used in animation creation and smart assistants, a key challenge is how to automatically generate dialogue gestures. However, current approaches often overlook a wide range of modalities and their interactions, resulting in gestures that have low contextual variation and noticeable jitter. To address these issues, we propose SSGesture, a novel diffusion-based framework that effectively captures cross-modal associations. Our three-layer attention structure enhances multimodal processing. We propose the first method to automatically resolve style conflicts through interpolation-based gesture style control, while implementing a unified unmarked style prompt structure via the PAAN layer. Our framework is practically applied in the field of intelligent virtual assistants to generate gestures in human animation synthesis and to realize various new applications. Extensive experiments and user studies have demonstrated that our proposed framework, provides substantial assistance in enhancing the efficiency of human animation production.</p>","PeriodicalId":55026,"journal":{"name":"IEEE Computer Graphics and Applications","volume":"PP ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SSGesture: Multimodal Gesture Generation Framework for Human Animation Synthesis.\",\"authors\":\"Xinyi Wang, Shiguang Liu, Xu Yang\",\"doi\":\"10.1109/MCG.2025.3577477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Technological innovations are reshaping the development of animation production. As virtual characters are increasingly used in animation creation and smart assistants, a key challenge is how to automatically generate dialogue gestures. However, current approaches often overlook a wide range of modalities and their interactions, resulting in gestures that have low contextual variation and noticeable jitter. To address these issues, we propose SSGesture, a novel diffusion-based framework that effectively captures cross-modal associations. Our three-layer attention structure enhances multimodal processing. We propose the first method to automatically resolve style conflicts through interpolation-based gesture style control, while implementing a unified unmarked style prompt structure via the PAAN layer. Our framework is practically applied in the field of intelligent virtual assistants to generate gestures in human animation synthesis and to realize various new applications. Extensive experiments and user studies have demonstrated that our proposed framework, provides substantial assistance in enhancing the efficiency of human animation production.</p>\",\"PeriodicalId\":55026,\"journal\":{\"name\":\"IEEE Computer Graphics and Applications\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Computer Graphics and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/MCG.2025.3577477\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Graphics and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/MCG.2025.3577477","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
SSGesture: Multimodal Gesture Generation Framework for Human Animation Synthesis.
Technological innovations are reshaping the development of animation production. As virtual characters are increasingly used in animation creation and smart assistants, a key challenge is how to automatically generate dialogue gestures. However, current approaches often overlook a wide range of modalities and their interactions, resulting in gestures that have low contextual variation and noticeable jitter. To address these issues, we propose SSGesture, a novel diffusion-based framework that effectively captures cross-modal associations. Our three-layer attention structure enhances multimodal processing. We propose the first method to automatically resolve style conflicts through interpolation-based gesture style control, while implementing a unified unmarked style prompt structure via the PAAN layer. Our framework is practically applied in the field of intelligent virtual assistants to generate gestures in human animation synthesis and to realize various new applications. Extensive experiments and user studies have demonstrated that our proposed framework, provides substantial assistance in enhancing the efficiency of human animation production.
期刊介绍:
IEEE Computer Graphics and Applications (CG&A) bridges the theory and practice of computer graphics, visualization, virtual and augmented reality, and HCI. From specific algorithms to full system implementations, CG&A offers a unique combination of peer-reviewed feature articles and informal departments. Theme issues guest edited by leading researchers in their fields track the latest developments and trends in computer-generated graphical content, while tutorials and surveys provide a broad overview of interesting and timely topics. Regular departments further explore the core areas of graphics as well as extend into topics such as usability, education, history, and opinion. Each issue, the story of our cover focuses on creative applications of the technology by an artist or designer. Published six times a year, CG&A is indispensable reading for people working at the leading edge of computer-generated graphics technology and its applications in everything from business to the arts.