{"title":"滤波变形注意力 GAN:从少量图像构建人体运动视频","authors":"Jianjun Zhu, Huihuang Zhao, Yudong Zhang","doi":"10.1007/s00371-024-03595-w","DOIUrl":null,"url":null,"abstract":"<p>Human motion transfer is challenging due to the complexity and diversity of human motion and clothing textures. Existing methods use 2D pose estimation to obtain poses, which can easily lead to unsmooth motion and artifacts. Therefore, this paper proposes a highly robust motion transmission model based on image deformation, called the Filter-Deform Attention Generative Adversarial Network (FDA GAN). This method can transmit complex human motion videos using only few human images. First, we use a 3D pose shape estimator instead of traditional 2D pose estimation to address the problem of unsmooth motion. Then, to tackle the artifact problem, we design a new attention mechanism and integrate it with the GAN, proposing a new network capable of effectively extracting image features and generating human motion videos. Finally, to further transfer the style of the source human, we propose a two-stream style loss, which enhances the model’s learning ability. Experimental results demonstrate that the proposed method outperforms recent methods in overall performance and various evaluation metrics. Project page: https://github.com/mioyeah/FDA-GAN.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Filter-deform attention GAN: constructing human motion videos from few images\",\"authors\":\"Jianjun Zhu, Huihuang Zhao, Yudong Zhang\",\"doi\":\"10.1007/s00371-024-03595-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Human motion transfer is challenging due to the complexity and diversity of human motion and clothing textures. Existing methods use 2D pose estimation to obtain poses, which can easily lead to unsmooth motion and artifacts. Therefore, this paper proposes a highly robust motion transmission model based on image deformation, called the Filter-Deform Attention Generative Adversarial Network (FDA GAN). This method can transmit complex human motion videos using only few human images. First, we use a 3D pose shape estimator instead of traditional 2D pose estimation to address the problem of unsmooth motion. Then, to tackle the artifact problem, we design a new attention mechanism and integrate it with the GAN, proposing a new network capable of effectively extracting image features and generating human motion videos. Finally, to further transfer the style of the source human, we propose a two-stream style loss, which enhances the model’s learning ability. Experimental results demonstrate that the proposed method outperforms recent methods in overall performance and various evaluation metrics. Project page: https://github.com/mioyeah/FDA-GAN.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03595-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03595-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
由于人体运动和服装纹理的复杂性和多样性,人体运动传输具有挑战性。现有方法使用二维姿态估计来获取姿态,这很容易导致运动不平滑和伪影。因此,本文提出了一种基于图像变形的高鲁棒性运动传输模型,称为滤波-变形注意生成对抗网络(FDA GAN)。这种方法只需使用少量人体图像就能传输复杂的人体运动视频。首先,我们使用三维姿态形状估计器代替传统的二维姿态估计器来解决不平滑运动的问题。然后,为了解决伪影问题,我们设计了一种新的注意力机制,并将其与 GAN 相结合,提出了一种能够有效提取图像特征并生成人体运动视频的新网络。最后,为了进一步传递源人类的风格,我们提出了双流风格损失,从而增强了模型的学习能力。实验结果表明,所提出的方法在整体性能和各种评价指标上都优于近期的方法。项目页面:https://github.com/mioyeah/FDA-GAN.
Filter-deform attention GAN: constructing human motion videos from few images
Human motion transfer is challenging due to the complexity and diversity of human motion and clothing textures. Existing methods use 2D pose estimation to obtain poses, which can easily lead to unsmooth motion and artifacts. Therefore, this paper proposes a highly robust motion transmission model based on image deformation, called the Filter-Deform Attention Generative Adversarial Network (FDA GAN). This method can transmit complex human motion videos using only few human images. First, we use a 3D pose shape estimator instead of traditional 2D pose estimation to address the problem of unsmooth motion. Then, to tackle the artifact problem, we design a new attention mechanism and integrate it with the GAN, proposing a new network capable of effectively extracting image features and generating human motion videos. Finally, to further transfer the style of the source human, we propose a two-stream style loss, which enhances the model’s learning ability. Experimental results demonstrate that the proposed method outperforms recent methods in overall performance and various evaluation metrics. Project page: https://github.com/mioyeah/FDA-GAN.