{"title":"Hierarchical Cross-Attention Network for Virtual Try-On","authors":"Hao Tang;Bin Ren;Pingping Wu;Nicu Sebe","doi":"10.1109/TMM.2025.3548437","DOIUrl":null,"url":null,"abstract":"In this article, we present an innovative solution tailored for the intricate challenges of the virtual try-on task—our novel Hierarchical Cross-Attention Network, HCANet. HCANet is meticulously crafted with two primary stages: geometric matching and try-on, each playing a crucial role in delivering realistic and visually convincing virtual try-on outcomes. A distinctive feature of HCANet is the incorporation of a novel Hierarchical Cross-Attention (HCA) block into both stages, enabling the effective capture of long-range correlations between individual and clothing modalities. The HCA block functions as a cornerstone, enhancing the depth and robustness of the network. By adopting a hierarchical approach, it facilitates a nuanced representation of the interaction between the person and clothing, capturing intricate details essential for an authentic virtual try-on experience. Our extensive set of experiments establishes the prowess of HCANet. The results showcase its cutting-edge performance across both objective quantitative metrics and subjective evaluations of visual realism. HCANet stands out as a state-of-the-art solution, demonstrating its capability to generate virtual try-on results that not only excel in accuracy but also satisfy subjective criteria of realism. This marks a significant step forward in advancing the field of virtual try-on technologies.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4454-4466"},"PeriodicalIF":9.7000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10912783/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this article, we present an innovative solution tailored for the intricate challenges of the virtual try-on task—our novel Hierarchical Cross-Attention Network, HCANet. HCANet is meticulously crafted with two primary stages: geometric matching and try-on, each playing a crucial role in delivering realistic and visually convincing virtual try-on outcomes. A distinctive feature of HCANet is the incorporation of a novel Hierarchical Cross-Attention (HCA) block into both stages, enabling the effective capture of long-range correlations between individual and clothing modalities. The HCA block functions as a cornerstone, enhancing the depth and robustness of the network. By adopting a hierarchical approach, it facilitates a nuanced representation of the interaction between the person and clothing, capturing intricate details essential for an authentic virtual try-on experience. Our extensive set of experiments establishes the prowess of HCANet. The results showcase its cutting-edge performance across both objective quantitative metrics and subjective evaluations of visual realism. HCANet stands out as a state-of-the-art solution, demonstrating its capability to generate virtual try-on results that not only excel in accuracy but also satisfy subjective criteria of realism. This marks a significant step forward in advancing the field of virtual try-on technologies.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.