基于部分解耦GAN的姿态引导人体图像合成

Asian Conference on Machine Learning Pub Date : 2022-10-07 DOI:10.48550/arXiv.2210.03627

Jianguo Wu, Jianzong Wang, Shijing Si, Xiaoyang Qu, Jing Xiao

{"title":"基于部分解耦GAN的姿态引导人体图像合成","authors":"Jianguo Wu, Jianzong Wang, Shijing Si, Xiaoyang Qu, Jing Xiao","doi":"10.48550/arXiv.2210.03627","DOIUrl":null,"url":null,"abstract":"Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\\eg, hair, face, hands, feet, \\etc) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.","PeriodicalId":119756,"journal":{"name":"Asian Conference on Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Pose Guided Human Image Synthesis with Partially Decoupled GAN\",\"authors\":\"Jianguo Wu, Jianzong Wang, Shijing Si, Xiaoyang Qu, Jing Xiao\",\"doi\":\"10.48550/arXiv.2210.03627\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\\\\eg, hair, face, hands, feet, \\\\etc) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.\",\"PeriodicalId\":119756,\"journal\":{\"name\":\"Asian Conference on Machine Learning\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Asian Conference on Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2210.03627\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.03627","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

姿态引导人体图像合成(PGHIS)是一项具有挑战性的任务，将人体图像从参考姿态转换为目标姿态，同时保持其风格。现有的方法大多是将整个参考人体图像的纹理编码到一个隐空间中，然后利用解码器合成目标姿态的图像纹理。然而，很难恢复整个人体图像的细节纹理。为了解决这个问题，我们提出了一种方法，将人体解耦成几个部分(例如，头发，脸，手，脚等)，然后使用这些部分中的每个部分来指导合成真实的人物图像，该方法保留了生成图像的详细信息。此外，我们还为PGHIS设计了一个基于多头注意力的模块。由于大多数基于卷积神经网络的方法由于卷积运算而难以对远程依赖进行建模，因此注意机制的远程建模能力比卷积神经网络更适合于姿态转移任务，特别是尖锐姿态变形任务。在Market-1501和DeepFashion数据集上进行的大量实验表明，我们的方法在定性和定量指标方面几乎优于其他现有的最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pose Guided Human Image Synthesis with Partially Decoupled GAN

Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming a human image from the reference pose to a target pose while preserving its style. Most existing methods encode the texture of the whole reference human image into a latent space, and then utilize a decoder to synthesize the image texture of the target pose. However, it is difficult to recover the detailed texture of the whole human image. To alleviate this problem, we propose a method by decoupling the human body into several parts (\eg, hair, face, hands, feet, \etc) and then using each of these parts to guide the synthesis of a realistic image of the person, which preserves the detailed information of the generated images. In addition, we design a multi-head attention-based module for PGHIS. Because most convolutional neural network-based methods have difficulty in modeling long-range dependency due to the convolutional operation, the long-range modeling capability of attention mechanism is more suitable than convolutional neural networks for pose transfer task, especially for sharp pose deformation. Extensive experiments on Market-1501 and DeepFashion datasets reveal that our method almost outperforms other existing state-of-the-art methods in terms of both qualitative and quantitative metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Asian Conference on Machine Learning

自引率

0.00%

发文量