{"title":"虚拟试穿的规模化语义对齐增强型多粒度自适应融合","authors":"Jing Zhang;Yumo Kang;Wenxuan Liu;Zhe Wang","doi":"10.1109/TNNLS.2025.3554826","DOIUrl":null,"url":null,"abstract":"Image-based virtual try-on aims to fit garments onto a target person accurately and naturally while preserving the textural details of the garment. Inspired by the dynamic perception process of the human visual system, which transitions from global perception to local details, we propose a novel multigrained adaptive fusion network for virtual try-on framework named MA-VITON. MA-VITON precisely aligns clothing semantic features with human body parts across different scales, reduces unrealistic textures caused by garment distortion, and employs coarse-to-fine clothing features to progressively guide the generation of try-on results. To achieve this, we introduce a scale-wise semantic alignment (SSA) module that extracts local features of clothing and the target person at various scales using flexible query strategies. It learns semantic correspondences between garments and the human body in the latent space through parallel bidirectional interactions, ensuring accurate feature alignment. Additionally, we propose a multigrained adaptive fusion (MAF) module, which identifies critical garment regions using a polyscale attention mechanism and allocates more tokens to adaptively preserve intricate textural details. Extensive experiments on multiple widely used public datasets demonstrate that MA-VITON achieves outstanding performance and surpasses state-of-the-art methods. The code is publicly available at <uri>https://github.com/Max-Teapot/MA-VITON</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 9","pages":"16909-16919"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scale-Wise Semantic Alignment Enhanced Multigrained Adaptive Fusion for Virtual Try-On\",\"authors\":\"Jing Zhang;Yumo Kang;Wenxuan Liu;Zhe Wang\",\"doi\":\"10.1109/TNNLS.2025.3554826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image-based virtual try-on aims to fit garments onto a target person accurately and naturally while preserving the textural details of the garment. Inspired by the dynamic perception process of the human visual system, which transitions from global perception to local details, we propose a novel multigrained adaptive fusion network for virtual try-on framework named MA-VITON. MA-VITON precisely aligns clothing semantic features with human body parts across different scales, reduces unrealistic textures caused by garment distortion, and employs coarse-to-fine clothing features to progressively guide the generation of try-on results. To achieve this, we introduce a scale-wise semantic alignment (SSA) module that extracts local features of clothing and the target person at various scales using flexible query strategies. It learns semantic correspondences between garments and the human body in the latent space through parallel bidirectional interactions, ensuring accurate feature alignment. Additionally, we propose a multigrained adaptive fusion (MAF) module, which identifies critical garment regions using a polyscale attention mechanism and allocates more tokens to adaptively preserve intricate textural details. Extensive experiments on multiple widely used public datasets demonstrate that MA-VITON achieves outstanding performance and surpasses state-of-the-art methods. The code is publicly available at <uri>https://github.com/Max-Teapot/MA-VITON</uri>.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"36 9\",\"pages\":\"16909-16919\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10973286/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10973286/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Scale-Wise Semantic Alignment Enhanced Multigrained Adaptive Fusion for Virtual Try-On
Image-based virtual try-on aims to fit garments onto a target person accurately and naturally while preserving the textural details of the garment. Inspired by the dynamic perception process of the human visual system, which transitions from global perception to local details, we propose a novel multigrained adaptive fusion network for virtual try-on framework named MA-VITON. MA-VITON precisely aligns clothing semantic features with human body parts across different scales, reduces unrealistic textures caused by garment distortion, and employs coarse-to-fine clothing features to progressively guide the generation of try-on results. To achieve this, we introduce a scale-wise semantic alignment (SSA) module that extracts local features of clothing and the target person at various scales using flexible query strategies. It learns semantic correspondences between garments and the human body in the latent space through parallel bidirectional interactions, ensuring accurate feature alignment. Additionally, we propose a multigrained adaptive fusion (MAF) module, which identifies critical garment regions using a polyscale attention mechanism and allocates more tokens to adaptively preserve intricate textural details. Extensive experiments on multiple widely used public datasets demonstrate that MA-VITON achieves outstanding performance and surpasses state-of-the-art methods. The code is publicly available at https://github.com/Max-Teapot/MA-VITON.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.