Precise region semantics-assisted GAN for pose-guided person image generation

IF 8.4 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ji Liu, Zhenyu Weng, Yuesheng Zhu
{"title":"Precise region semantics-assisted GAN for pose-guided person image generation","authors":"Ji Liu,&nbsp;Zhenyu Weng,&nbsp;Yuesheng Zhu","doi":"10.1049/cit2.12255","DOIUrl":null,"url":null,"abstract":"<p>Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a <b>23%</b> reduction in LPIPS compared to the method without using the semantic maps and a <b>6.9%</b> reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"665-678"},"PeriodicalIF":8.4000,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12255","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12255","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a 23% reduction in LPIPS compared to the method without using the semantic maps and a 6.9% reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.

Abstract Image

精确区域语义辅助 GAN,用于生成姿态引导的人物图像
根据一个源姿态和另一个不同的目标姿态生成逼真的人物图像是一项前景广阔的计算机视觉任务。以往的主流方法主要侧重于探索基于关键点的源姿态与目标姿态之间的变换关系,而很少研究基于区域的人体语义信息。目前一些采用解析图的方法既没有考虑精确的局部姿势-语义匹配问题,也没有考虑两个不同姿势之间的对应关系。本研究针对姿势引导的人物图像生成任务,提出了一种区域语义辅助生成对抗网络(RSA-GAN)。其中,首先开发了一个区域姿势引导的语义融合模块,以解决某个源图像的语义解析图和源姿势中相应关键点之间的不精确匹配问题。为了使源图像中人的姿态与目标姿态很好地匹配,设计了一个姿态对应引导的姿态注入模块来学习源姿态与目标姿态之间的对应关系。此外,还提出了一个基于深度卷积交叉注意的选通风格整合模块,将对齐后的粗略风格信息与精确匹配的姿势引导语义信息一起分配给目标姿势。实验结果表明,与不使用语义图的方法相比,所提出的 RSA-GAN 的 LPIPS 降低了 23%,与使用语义图的方法相比,FID 降低了 6.9%,而且还显示出更高的真实定性结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CAAI Transactions on Intelligence Technology
CAAI Transactions on Intelligence Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
11.00
自引率
3.90%
发文量
134
审稿时长
35 weeks
期刊介绍: CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信