KF-VTON: Keypoints-Driven Flow Based Virtual Try-On Network

IF 6 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-06-19 DOI:10.1145/3673903

Zizhao Wu, Siyu Liu, Peioyan Lu, Ping Yang, Yongkang Wong, Xiaoling Gu, Mohan S. Kankanhalli

{"title":"KF-VTON: Keypoints-Driven Flow Based Virtual Try-On Network","authors":"Zizhao Wu, Siyu Liu, Peioyan Lu, Ping Yang, Yongkang Wong, Xiaoling Gu, Mohan S. Kankanhalli","doi":"10.1145/3673903","DOIUrl":null,"url":null,"abstract":"Image-based virtual try-on aims to fit a target garment to a reference person. Most existing methods are limited to solving the Garment-To-Person (G2P) try-on task that transfers a garment from a clean product image to the reference person and do not consider the Person-To-Person (P2P) try-on task that transfers a garment from a clothed person image to the reference person, which limits the practical applicability. The P2P try-on task is more challenging due to spatial discrepancies caused by different poses, body shapes, and views between the reference person and the target person. To address this issue, we propose a novel Keypoints-Driven Flow Based Virtual Try-On Network (KF-VTON) for handling both the G2P and P2P try-on tasks. Our KF-VTON has two key innovations: 1) We propose a new keypoints-driven flow based deformation model to warp the garment. This model establishes spatial correspondences between the target garment and reference person by combining the robustness of Thin-plate Spline (TPS) based deformation and the flexibility of appearance flow based deformation. 2) We investigate a powerful Context-aware Spatially Adaptive Normalization (CSAN) generative module to synthesize the final try-on image. Particularly, CSAN integrates rich contextual information with semantic parsing guidance to properly infer unobserved garment appearances. Extensive experiments demonstrate that our KF-VTON is capable of producing photo-realistic and high-fidelity try-on results for the G2P as well as P2P try-on tasks and surpasses previous state-of-the-art methods both quantitatively and qualitatively. Our code is available at https://github.com/OIUIU/KF-VTON.","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"34 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3673903","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Image-based virtual try-on aims to fit a target garment to a reference person. Most existing methods are limited to solving the Garment-To-Person (G2P) try-on task that transfers a garment from a clean product image to the reference person and do not consider the Person-To-Person (P2P) try-on task that transfers a garment from a clothed person image to the reference person, which limits the practical applicability. The P2P try-on task is more challenging due to spatial discrepancies caused by different poses, body shapes, and views between the reference person and the target person. To address this issue, we propose a novel Keypoints-Driven Flow Based Virtual Try-On Network (KF-VTON) for handling both the G2P and P2P try-on tasks. Our KF-VTON has two key innovations: 1) We propose a new keypoints-driven flow based deformation model to warp the garment. This model establishes spatial correspondences between the target garment and reference person by combining the robustness of Thin-plate Spline (TPS) based deformation and the flexibility of appearance flow based deformation. 2) We investigate a powerful Context-aware Spatially Adaptive Normalization (CSAN) generative module to synthesize the final try-on image. Particularly, CSAN integrates rich contextual information with semantic parsing guidance to properly infer unobserved garment appearances. Extensive experiments demonstrate that our KF-VTON is capable of producing photo-realistic and high-fidelity try-on results for the G2P as well as P2P try-on tasks and surpasses previous state-of-the-art methods both quantitatively and qualitatively. Our code is available at https://github.com/OIUIU/KF-VTON.

查看原文本刊更多论文

KF-VTON：关键点驱动的基于流量的虚拟试运行网络

基于图像的虚拟试穿旨在使目标服装与参照人相匹配。现有的大多数方法仅限于解决将服装从干净的产品图像转移到参照人的服装对人（G2P）试穿任务，而没有考虑将服装从穿衣人图像转移到参照人的人对人（P2P）试穿任务，这限制了其实际应用性。P2P 试穿任务更具挑战性，因为参照人和目标人的姿势、体形和视角不同，会造成空间差异。为了解决这个问题，我们提出了一种新颖的基于关键点流的虚拟试穿网络（KF-VTON），用于处理 G2P 和 P2P 试穿任务。我们的 KF-VTON 有两个关键创新点：1) 我们提出了一种新的基于关键点驱动流量的变形模型来翘曲服装。该模型结合了基于薄板样条（TPS）变形的鲁棒性和基于外观流变形的灵活性，在目标服装和参照人之间建立空间对应关系。2) 我们研究了一个功能强大的上下文感知空间自适应归一化（CSAN）生成模块，用于合成最终的试穿图像。特别是，CSAN 将丰富的上下文信息与语义解析指导相结合，以正确推断未观察到的服装外观。广泛的实验证明，我们的 KF-VTON 能够为 G2P 和 P2P 试穿任务生成照片般逼真的高保真试穿结果，并在定量和定性方面超越了之前最先进的方法。我们的代码见 https://github.com/OIUIU/KF-VTON。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.