Foot-constrained spatial-temporal transformer for keyframe-based complex motion synthesis

IF 0.9 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds Pub Date : 2023-09-15 DOI:10.1002/cav.2217

Hao Li, Ju Dai, Rui Zeng, Junxuan Bai, Zhangmeng Chen, Junjun Pan

{"title":"Foot-constrained spatial-temporal transformer for keyframe-based complex motion synthesis","authors":"Hao Li, Ju Dai, Rui Zeng, Junxuan Bai, Zhangmeng Chen, Junjun Pan","doi":"10.1002/cav.2217","DOIUrl":null,"url":null,"abstract":"<p>Keyframe-based motion synthesis holds significant effects in games and movies. Existing methods for complex motion synthesis often require secondary post-processing to eliminate foot sliding to yield satisfied motions. In this paper, we analyze the cause of the sliding issue attributed to the mismatch between root trajectory and motion postures. To address the problem, we propose a novel end-to-end Spatial-Temporal transformer network conditioned on foot contact information for high-quality keyframe-based motion synthesis. Specifically, our model mainly compromises a spatial-temporal transformer encoder and two decoders to learn motion sequence features and predict motion postures and foot contact states. A novel constrained embedding, which consists of keyframes and foot contact constraints, is incorporated into the model to facilitate network learning from diversified control knowledge. To generate matched root trajectory with motion postures, we design a differentiable root trajectory reconstruction algorithm to construct root trajectory based on the decoder outputs. Qualitative and quantitative experiments on the public LaFAN1, Dance, and Martial Arts datasets demonstrate the superiority of our method in generating high-quality complex motions compared with state-of-the-arts.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 1","pages":""},"PeriodicalIF":0.9000,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2217","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Keyframe-based motion synthesis holds significant effects in games and movies. Existing methods for complex motion synthesis often require secondary post-processing to eliminate foot sliding to yield satisfied motions. In this paper, we analyze the cause of the sliding issue attributed to the mismatch between root trajectory and motion postures. To address the problem, we propose a novel end-to-end Spatial-Temporal transformer network conditioned on foot contact information for high-quality keyframe-based motion synthesis. Specifically, our model mainly compromises a spatial-temporal transformer encoder and two decoders to learn motion sequence features and predict motion postures and foot contact states. A novel constrained embedding, which consists of keyframes and foot contact constraints, is incorporated into the model to facilitate network learning from diversified control knowledge. To generate matched root trajectory with motion postures, we design a differentiable root trajectory reconstruction algorithm to construct root trajectory based on the decoder outputs. Qualitative and quantitative experiments on the public LaFAN1, Dance, and Martial Arts datasets demonstrate the superiority of our method in generating high-quality complex motions compared with state-of-the-arts.

Abstract Image

查看原文本刊更多论文

基于关键帧的复杂运动合成的脚约束时空变换器

基于关键帧的运动合成在游戏和电影中具有显著效果。现有的复杂运动合成方法通常需要进行二次后处理，以消除脚部滑动，从而获得满意的运动效果。在本文中，我们分析了根轨迹与运动姿势不匹配导致的滑动问题的原因。为解决这一问题，我们提出了一种以脚部接触信息为条件的新型端到端时空转换器网络，用于基于关键帧的高质量运动合成。具体来说，我们的模型主要包括一个时空变换编码器和两个解码器，用于学习运动序列特征并预测运动姿势和脚部接触状态。模型中还加入了一种新颖的约束嵌入，由关键帧和脚部接触约束组成，以促进网络从多样化的控制知识中学习。为了生成与运动姿态相匹配的根轨迹，我们设计了一种可微分根轨迹重建算法，根据解码器输出构建根轨迹。在公开的 LaFAN1、舞蹈和武术数据集上进行的定性和定量实验证明，我们的方法在生成高质量复杂运动方面优于同行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Animation and Virtual Worlds 工程技术-计算机：软件工程

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.