虚拟试穿的规模化语义对齐增强型多粒度自适应融合

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-04-22 DOI:10.1109/TNNLS.2025.3554826

Jing Zhang;Yumo Kang;Wenxuan Liu;Zhe Wang

{"title":"虚拟试穿的规模化语义对齐增强型多粒度自适应融合","authors":"Jing Zhang;Yumo Kang;Wenxuan Liu;Zhe Wang","doi":"10.1109/TNNLS.2025.3554826","DOIUrl":null,"url":null,"abstract":"Image-based virtual try-on aims to fit garments onto a target person accurately and naturally while preserving the textural details of the garment. Inspired by the dynamic perception process of the human visual system, which transitions from global perception to local details, we propose a novel multigrained adaptive fusion network for virtual try-on framework named MA-VITON. MA-VITON precisely aligns clothing semantic features with human body parts across different scales, reduces unrealistic textures caused by garment distortion, and employs coarse-to-fine clothing features to progressively guide the generation of try-on results. To achieve this, we introduce a scale-wise semantic alignment (SSA) module that extracts local features of clothing and the target person at various scales using flexible query strategies. It learns semantic correspondences between garments and the human body in the latent space through parallel bidirectional interactions, ensuring accurate feature alignment. Additionally, we propose a multigrained adaptive fusion (MAF) module, which identifies critical garment regions using a polyscale attention mechanism and allocates more tokens to adaptively preserve intricate textural details. Extensive experiments on multiple widely used public datasets demonstrate that MA-VITON achieves outstanding performance and surpasses state-of-the-art methods. The code is publicly available at <uri>https://github.com/Max-Teapot/MA-VITON</uri>.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 9","pages":"16909-16919"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scale-Wise Semantic Alignment Enhanced Multigrained Adaptive Fusion for Virtual Try-On\",\"authors\":\"Jing Zhang;Yumo Kang;Wenxuan Liu;Zhe Wang\",\"doi\":\"10.1109/TNNLS.2025.3554826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Image-based virtual try-on aims to fit garments onto a target person accurately and naturally while preserving the textural details of the garment. Inspired by the dynamic perception process of the human visual system, which transitions from global perception to local details, we propose a novel multigrained adaptive fusion network for virtual try-on framework named MA-VITON. MA-VITON precisely aligns clothing semantic features with human body parts across different scales, reduces unrealistic textures caused by garment distortion, and employs coarse-to-fine clothing features to progressively guide the generation of try-on results. To achieve this, we introduce a scale-wise semantic alignment (SSA) module that extracts local features of clothing and the target person at various scales using flexible query strategies. It learns semantic correspondences between garments and the human body in the latent space through parallel bidirectional interactions, ensuring accurate feature alignment. Additionally, we propose a multigrained adaptive fusion (MAF) module, which identifies critical garment regions using a polyscale attention mechanism and allocates more tokens to adaptively preserve intricate textural details. Extensive experiments on multiple widely used public datasets demonstrate that MA-VITON achieves outstanding performance and surpasses state-of-the-art methods. The code is publicly available at <uri>https://github.com/Max-Teapot/MA-VITON</uri>.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"36 9\",\"pages\":\"16909-16919\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10973286/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10973286/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基于图像的虚拟试穿旨在使服装准确自然地适合目标人，同时保留服装的纹理细节。受人类视觉系统从全局感知到局部细节的动态感知过程的启发，我们提出了一种新的多粒度自适应融合网络，用于虚拟试穿框架MA-VITON。MA-VITON将不同尺度的服装语义特征与人体部位精确对齐，减少因服装变形而产生的不现实纹理，采用由粗到细的服装特征，逐步引导试穿结果的生成。为了实现这一点，我们引入了一个尺度语义对齐（SSA）模块，该模块使用灵活的查询策略在不同尺度上提取服装和目标人物的局部特征。它通过平行的双向交互，在潜在空间中学习服装与人体之间的语义对应关系，确保准确的特征对齐。此外，我们提出了一个多粒度自适应融合（MAF）模块，该模块使用多尺度注意力机制识别关键的服装区域，并分配更多的令牌来自适应地保留复杂的纹理细节。在多个广泛使用的公共数据集上进行的大量实验表明，MA-VITON取得了出色的性能，超越了最先进的方法。该代码可在https://github.com/Max-Teapot/MA-VITON上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scale-Wise Semantic Alignment Enhanced Multigrained Adaptive Fusion for Virtual Try-On

Image-based virtual try-on aims to fit garments onto a target person accurately and naturally while preserving the textural details of the garment. Inspired by the dynamic perception process of the human visual system, which transitions from global perception to local details, we propose a novel multigrained adaptive fusion network for virtual try-on framework named MA-VITON. MA-VITON precisely aligns clothing semantic features with human body parts across different scales, reduces unrealistic textures caused by garment distortion, and employs coarse-to-fine clothing features to progressively guide the generation of try-on results. To achieve this, we introduce a scale-wise semantic alignment (SSA) module that extracts local features of clothing and the target person at various scales using flexible query strategies. It learns semantic correspondences between garments and the human body in the latent space through parallel bidirectional interactions, ensuring accurate feature alignment. Additionally, we propose a multigrained adaptive fusion (MAF) module, which identifies critical garment regions using a polyscale attention mechanism and allocates more tokens to adaptively preserve intricate textural details. Extensive experiments on multiple widely used public datasets demonstrate that MA-VITON achieves outstanding performance and surpasses state-of-the-art methods. The code is publicly available at https://github.com/Max-Teapot/MA-VITON.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.