滤波变形注意力 GAN：从少量图像构建人体运动视频

The Visual Computer Pub Date : 2024-08-26 DOI:10.1007/s00371-024-03595-w

Jianjun Zhu, Huihuang Zhao, Yudong Zhang

{"title":"滤波变形注意力 GAN：从少量图像构建人体运动视频","authors":"Jianjun Zhu, Huihuang Zhao, Yudong Zhang","doi":"10.1007/s00371-024-03595-w","DOIUrl":null,"url":null,"abstract":"<p>Human motion transfer is challenging due to the complexity and diversity of human motion and clothing textures. Existing methods use 2D pose estimation to obtain poses, which can easily lead to unsmooth motion and artifacts. Therefore, this paper proposes a highly robust motion transmission model based on image deformation, called the Filter-Deform Attention Generative Adversarial Network (FDA GAN). This method can transmit complex human motion videos using only few human images. First, we use a 3D pose shape estimator instead of traditional 2D pose estimation to address the problem of unsmooth motion. Then, to tackle the artifact problem, we design a new attention mechanism and integrate it with the GAN, proposing a new network capable of effectively extracting image features and generating human motion videos. Finally, to further transfer the style of the source human, we propose a two-stream style loss, which enhances the model’s learning ability. Experimental results demonstrate that the proposed method outperforms recent methods in overall performance and various evaluation metrics. Project page: https://github.com/mioyeah/FDA-GAN.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"44 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Filter-deform attention GAN: constructing human motion videos from few images\",\"authors\":\"Jianjun Zhu, Huihuang Zhao, Yudong Zhang\",\"doi\":\"10.1007/s00371-024-03595-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Human motion transfer is challenging due to the complexity and diversity of human motion and clothing textures. Existing methods use 2D pose estimation to obtain poses, which can easily lead to unsmooth motion and artifacts. Therefore, this paper proposes a highly robust motion transmission model based on image deformation, called the Filter-Deform Attention Generative Adversarial Network (FDA GAN). This method can transmit complex human motion videos using only few human images. First, we use a 3D pose shape estimator instead of traditional 2D pose estimation to address the problem of unsmooth motion. Then, to tackle the artifact problem, we design a new attention mechanism and integrate it with the GAN, proposing a new network capable of effectively extracting image features and generating human motion videos. Finally, to further transfer the style of the source human, we propose a two-stream style loss, which enhances the model’s learning ability. Experimental results demonstrate that the proposed method outperforms recent methods in overall performance and various evaluation metrics. Project page: https://github.com/mioyeah/FDA-GAN.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"44 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03595-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03595-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于人体运动和服装纹理的复杂性和多样性，人体运动传输具有挑战性。现有方法使用二维姿态估计来获取姿态，这很容易导致运动不平滑和伪影。因此，本文提出了一种基于图像变形的高鲁棒性运动传输模型，称为滤波-变形注意生成对抗网络（FDA GAN）。这种方法只需使用少量人体图像就能传输复杂的人体运动视频。首先，我们使用三维姿态形状估计器代替传统的二维姿态估计器来解决不平滑运动的问题。然后，为了解决伪影问题，我们设计了一种新的注意力机制，并将其与 GAN 相结合，提出了一种能够有效提取图像特征并生成人体运动视频的新网络。最后，为了进一步传递源人类的风格，我们提出了双流风格损失，从而增强了模型的学习能力。实验结果表明，所提出的方法在整体性能和各种评价指标上都优于近期的方法。项目页面：https://github.com/mioyeah/FDA-GAN.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Filter-deform attention GAN: constructing human motion videos from few images

查看原文本刊更多论文

Filter-deform attention GAN: constructing human motion videos from few images

Human motion transfer is challenging due to the complexity and diversity of human motion and clothing textures. Existing methods use 2D pose estimation to obtain poses, which can easily lead to unsmooth motion and artifacts. Therefore, this paper proposes a highly robust motion transmission model based on image deformation, called the Filter-Deform Attention Generative Adversarial Network (FDA GAN). This method can transmit complex human motion videos using only few human images. First, we use a 3D pose shape estimator instead of traditional 2D pose estimation to address the problem of unsmooth motion. Then, to tackle the artifact problem, we design a new attention mechanism and integrate it with the GAN, proposing a new network capable of effectively extracting image features and generating human motion videos. Finally, to further transfer the style of the source human, we propose a two-stream style loss, which enhances the model’s learning ability. Experimental results demonstrate that the proposed method outperforms recent methods in overall performance and various evaluation metrics. Project page: https://github.com/mioyeah/FDA-GAN.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量