{"title":"KF-VTON: Keypoints-Driven Flow Based Virtual Try-On Network","authors":"Zizhao Wu, Siyu Liu, Peioyan Lu, Ping Yang, Yongkang Wong, Xiaoling Gu, Mohan S. Kankanhalli","doi":"10.1145/3673903","DOIUrl":null,"url":null,"abstract":"<p>Image-based virtual try-on aims to fit a target garment to a reference person. Most existing methods are limited to solving the Garment-To-Person (G2P) try-on task that transfers a garment from a clean product image to the reference person and do not consider the Person-To-Person (P2P) try-on task that transfers a garment from a clothed person image to the reference person, which limits the practical applicability. The P2P try-on task is more challenging due to spatial discrepancies caused by different poses, body shapes, and views between the reference person and the target person. To address this issue, we propose a novel Keypoints-Driven Flow Based Virtual Try-On Network (KF-VTON) for handling both the G2P and P2P try-on tasks. Our KF-VTON has two key innovations: 1) We propose a new <i>keypoints-driven flow based deformation model</i> to warp the garment. This model establishes spatial correspondences between the target garment and reference person by combining the robustness of Thin-plate Spline (TPS) based deformation and the flexibility of appearance flow based deformation. 2) We investigate a powerful <i>Context-aware Spatially Adaptive Normalization (CSAN) generative module</i> to synthesize the final try-on image. Particularly, CSAN integrates rich contextual information with semantic parsing guidance to properly infer unobserved garment appearances. Extensive experiments demonstrate that our KF-VTON is capable of producing photo-realistic and high-fidelity try-on results for the G2P as well as P2P try-on tasks and surpasses previous state-of-the-art methods both quantitatively and qualitatively. Our code is available at https://github.com/OIUIU/KF-VTON.</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"34 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3673903","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Image-based virtual try-on aims to fit a target garment to a reference person. Most existing methods are limited to solving the Garment-To-Person (G2P) try-on task that transfers a garment from a clean product image to the reference person and do not consider the Person-To-Person (P2P) try-on task that transfers a garment from a clothed person image to the reference person, which limits the practical applicability. The P2P try-on task is more challenging due to spatial discrepancies caused by different poses, body shapes, and views between the reference person and the target person. To address this issue, we propose a novel Keypoints-Driven Flow Based Virtual Try-On Network (KF-VTON) for handling both the G2P and P2P try-on tasks. Our KF-VTON has two key innovations: 1) We propose a new keypoints-driven flow based deformation model to warp the garment. This model establishes spatial correspondences between the target garment and reference person by combining the robustness of Thin-plate Spline (TPS) based deformation and the flexibility of appearance flow based deformation. 2) We investigate a powerful Context-aware Spatially Adaptive Normalization (CSAN) generative module to synthesize the final try-on image. Particularly, CSAN integrates rich contextual information with semantic parsing guidance to properly infer unobserved garment appearances. Extensive experiments demonstrate that our KF-VTON is capable of producing photo-realistic and high-fidelity try-on results for the G2P as well as P2P try-on tasks and surpasses previous state-of-the-art methods both quantitatively and qualitatively. Our code is available at https://github.com/OIUIU/KF-VTON.
期刊介绍:
The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome.
TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.