{"title":"用于生成人物图像的多尺度跨域配准","authors":"Liyuan Ma, Tingwei Gao, Haibin Shen, Kejie Huang","doi":"10.1049/cit2.12224","DOIUrl":null,"url":null,"abstract":"<p>Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross-domain alignment on high-level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi-scale alignment, in both low-level and high-level domains, for ensuring reliable cross-domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi-scale Cross-domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi-scale interaction between pose and appearance inputs, which employs pair-wise window-based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"374-387"},"PeriodicalIF":8.4000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12224","citationCount":"0","resultStr":"{\"title\":\"Multi-scale cross-domain alignment for person image generation\",\"authors\":\"Liyuan Ma, Tingwei Gao, Haibin Shen, Kejie Huang\",\"doi\":\"10.1049/cit2.12224\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross-domain alignment on high-level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi-scale alignment, in both low-level and high-level domains, for ensuring reliable cross-domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi-scale Cross-domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi-scale interaction between pose and appearance inputs, which employs pair-wise window-based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.</p>\",\"PeriodicalId\":46211,\"journal\":{\"name\":\"CAAI Transactions on Intelligence Technology\",\"volume\":\"9 2\",\"pages\":\"374-387\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2023-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12224\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CAAI Transactions on Intelligence Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12224\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12224","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Multi-scale cross-domain alignment for person image generation
Person image generation aims to generate images that maintain the original human appearance in different target poses. Recent works have revealed that the critical element in achieving this task is the alignment of appearance domain and pose domain. Previous alignment methods, such as appearance flow warping, correspondence learning and cross attention, often encounter challenges when it comes to producing fine texture details. These approaches suffer from limitations in accurately estimating appearance flows due to the lack of global receptive field. Alternatively, they can only perform cross-domain alignment on high-level feature maps with small spatial dimensions since the computational complexity increases quadratically with larger feature sizes. In this article, the significance of multi-scale alignment, in both low-level and high-level domains, for ensuring reliable cross-domain alignment of appearance and pose is demonstrated. To this end, a novel and effective method, named Multi-scale Cross-domain Alignment (MCA) is proposed. Firstly, MCA adopts global context aggregation transformer to model multi-scale interaction between pose and appearance inputs, which employs pair-wise window-based cross attention. Furthermore, leveraging the integrated global source information for each target position, MCA applies flexible flow prediction head and point correlation to effectively conduct warping and fusing for final transformed person image generation. Our proposed MCA achieves superior performance on two popular datasets than other methods, which verifies the effectiveness of our approach.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.