Precise region semantics-assisted GAN for pose-guided person image generation

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology Pub Date : 2023-08-02 DOI:10.1049/cit2.12255

Ji Liu, Zhenyu Weng, Yuesheng Zhu

{"title":"Precise region semantics-assisted GAN for pose-guided person image generation","authors":"Ji Liu, Zhenyu Weng, Yuesheng Zhu","doi":"10.1049/cit2.12255","DOIUrl":null,"url":null,"abstract":"Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a 23% reduction in LPIPS compared to the method without using the semantic maps and a 6.9% reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"665-678"},"PeriodicalIF":8.4000,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12255","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12255","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a 23% reduction in LPIPS compared to the method without using the semantic maps and a 6.9% reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.

Abstract Image

查看原文本刊更多论文

精确区域语义辅助 GAN，用于生成姿态引导的人物图像

根据一个源姿态和另一个不同的目标姿态生成逼真的人物图像是一项前景广阔的计算机视觉任务。以往的主流方法主要侧重于探索基于关键点的源姿态与目标姿态之间的变换关系，而很少研究基于区域的人体语义信息。目前一些采用解析图的方法既没有考虑精确的局部姿势-语义匹配问题，也没有考虑两个不同姿势之间的对应关系。本研究针对姿势引导的人物图像生成任务，提出了一种区域语义辅助生成对抗网络（RSA-GAN）。其中，首先开发了一个区域姿势引导的语义融合模块，以解决某个源图像的语义解析图和源姿势中相应关键点之间的不精确匹配问题。为了使源图像中人的姿态与目标姿态很好地匹配，设计了一个姿态对应引导的姿态注入模块来学习源姿态与目标姿态之间的对应关系。此外，还提出了一个基于深度卷积交叉注意的选通风格整合模块，将对齐后的粗略风格信息与精确匹配的姿势引导语义信息一起分配给目标姿势。实验结果表明，与不使用语义图的方法相比，所提出的 RSA-GAN 的 LPIPS 降低了 23%，与使用语义图的方法相比，FID 降低了 6.9%，而且还显示出更高的真实定性结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

11.00

自引率

3.90%

发文量

134

审稿时长

35 weeks

期刊介绍： CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.