Diffusion-based Human Motion Style Transfer with Semantic Guidance

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum Pub Date : 2024-10-09 DOI:10.1111/cgf.15169

Lei Hu, Zihao Zhang, Yongjing Ye, Yiwen Xu, Shihong Xia

{"title":"Diffusion-based Human Motion Style Transfer with Semantic Guidance","authors":"Lei Hu, Zihao Zhang, Yongjing Ye, Yiwen Xu, Shihong Xia","doi":"10.1111/cgf.15169","DOIUrl":null,"url":null,"abstract":"<p>3D Human motion style transfer is a fundamental problem in computer graphic and animation processing. Existing AdaIN-based methods necessitate datasets with balanced style distribution and content/style labels to train the clustered latent space. However, we may encounter a single unseen style example in practical scenarios, but not in sufficient quantity to constitute a style cluster for AdaIN-based methods. Therefore, in this paper, we propose a novel two-stage framework for few-shot style transfer learning based on the diffusion model. Specifically, in the first stage, we pre-train a diffusion-based text-to-motion model as a generative prior so that it can cope with various content motion inputs. In the second stage, based on the single style example, we fine-tune the pre-trained diffusion model in a few-shot manner to make it capable of style transfer. The key idea is regarding the reverse process of diffusion as a motion-style translation process since the motion styles can be viewed as special motion variations. During the fine-tuning for style transfer, a simple yet effective semantic-guided style transfer loss coordinated with style example reconstruction loss is introduced to supervise the style transfer in CLIP semantic space. The qualitative and quantitative evaluations demonstrate that our method can achieve state-of-the-art performance and has practical applications. The source code is available at https://github.com/hlcdyy/diffusion-based-motion-style-transfer.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Graphics Forum","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cgf.15169","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

3D Human motion style transfer is a fundamental problem in computer graphic and animation processing. Existing AdaIN-based methods necessitate datasets with balanced style distribution and content/style labels to train the clustered latent space. However, we may encounter a single unseen style example in practical scenarios, but not in sufficient quantity to constitute a style cluster for AdaIN-based methods. Therefore, in this paper, we propose a novel two-stage framework for few-shot style transfer learning based on the diffusion model. Specifically, in the first stage, we pre-train a diffusion-based text-to-motion model as a generative prior so that it can cope with various content motion inputs. In the second stage, based on the single style example, we fine-tune the pre-trained diffusion model in a few-shot manner to make it capable of style transfer. The key idea is regarding the reverse process of diffusion as a motion-style translation process since the motion styles can be viewed as special motion variations. During the fine-tuning for style transfer, a simple yet effective semantic-guided style transfer loss coordinated with style example reconstruction loss is introduced to supervise the style transfer in CLIP semantic space. The qualitative and quantitative evaluations demonstrate that our method can achieve state-of-the-art performance and has practical applications. The source code is available at https://github.com/hlcdyy/diffusion-based-motion-style-transfer.

查看原文本刊更多论文

基于扩散的人体运动风格转移与语义指导

三维人体运动风格转换是计算机图形和动画处理中的一个基本问题。现有的基于 AdaIN 的方法需要具有均衡风格分布和内容/风格标签的数据集来训练聚类潜在空间。然而，在实际场景中，我们可能会遇到单一的未见风格示例，但其数量不足以构成基于 AdaIN 方法的风格聚类。因此，在本文中，我们提出了一种基于扩散模型的两阶段式风格迁移学习框架。具体来说，在第一阶段，我们预先训练一个基于扩散的文本到运动模型作为生成先验，使其能够应对各种内容运动输入。在第二阶段，基于单一风格示例，我们对预训练的扩散模型进行微调，使其能够进行风格转换。关键的思路是将扩散的反向过程视为运动风格的转换过程，因为运动风格可以被视为特殊的运动变化。在风格转换的微调过程中，引入了一种简单而有效的语义指导风格转换损失，该损失与风格示例重构损失相协调，用于监督 CLIP 语义空间中的风格转换。定性和定量评估表明，我们的方法可以达到最先进的性能，并具有实际应用价值。源代码见 https://github.com/hlcdyy/diffusion-based-motion-style-transfer。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Graphics Forum 工程技术-计算机：软件工程

CiteScore

5.80

自引率

12.00%

发文量

175

审稿时长

3-6 weeks

期刊介绍： Computer Graphics Forum is the official journal of Eurographics, published in cooperation with Wiley-Blackwell, and is a unique, international source of information for computer graphics professionals interested in graphics developments worldwide. It is now one of the leading journals for researchers, developers and users of computer graphics in both commercial and academic environments. The journal reports on the latest developments in the field throughout the world and covers all aspects of the theory, practice and application of computer graphics.