GLV: Geometric Correlation Distillation for Latent Diffusion-Enhanced Parser-Free Virtual Try-On

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-04-01 DOI:10.1109/TCSVT.2025.3556749

Chenghu Du;Junyin Wang;Kai Liu;Shengwu Xiong

{"title":"GLV: Geometric Correlation Distillation for Latent Diffusion-Enhanced Parser-Free Virtual Try-On","authors":"Chenghu Du;Junyin Wang;Kai Liu;Shengwu Xiong","doi":"10.1109/TCSVT.2025.3556749","DOIUrl":null,"url":null,"abstract":"Applying knowledge distillation to virtual try-on tasks is challenging because current methods fail to fully and efficiently exploit responsible teacher knowledge. In other words, existing approaches merely transfer prior knowledge to the student model via pseudo-labels generated by the teacher model, resulting in shallow knowledge representation and low training efficiency. To address these limitations, we propose a novel teacher-student architecture for parser-free virtual try-on, named GLV, which generates high-quality try-on results with realistic body details. Specifically, we propose a deformation-related prior distillation method to effectively leverage the valuable deformation information contained in the teacher warpage model. This enhances the convergence efficiency of the student warpage model, preventing it from getting stuck in a local minima. Moreover, we are the first to propose a geometric correlation distillation, which models the underlying geometric relationship between clothing and the person and transfers this relationship from the teacher to the student. This enables the student warpage model to reduce the entanglement of deformation-irrelevant features, such as color and texture. Finally, we propose a clothing-body retouching method for try-on result synthesis, which refines the denoising process in the latent space of a well-trained diffusion model, thereby preventing catastrophic forgetting. This method seamlessly transforms the parser-based inpainting synthesis paradigm into a parser-free synthesis paradigm and enables efficient convergence of the diffusion model with only fine-tuning. Extensive experiments demonstrate the generality of our approach and highlight its superiority over previous methods.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9175-9189"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10947108/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Applying knowledge distillation to virtual try-on tasks is challenging because current methods fail to fully and efficiently exploit responsible teacher knowledge. In other words, existing approaches merely transfer prior knowledge to the student model via pseudo-labels generated by the teacher model, resulting in shallow knowledge representation and low training efficiency. To address these limitations, we propose a novel teacher-student architecture for parser-free virtual try-on, named GLV, which generates high-quality try-on results with realistic body details. Specifically, we propose a deformation-related prior distillation method to effectively leverage the valuable deformation information contained in the teacher warpage model. This enhances the convergence efficiency of the student warpage model, preventing it from getting stuck in a local minima. Moreover, we are the first to propose a geometric correlation distillation, which models the underlying geometric relationship between clothing and the person and transfers this relationship from the teacher to the student. This enables the student warpage model to reduce the entanglement of deformation-irrelevant features, such as color and texture. Finally, we propose a clothing-body retouching method for try-on result synthesis, which refines the denoising process in the latent space of a well-trained diffusion model, thereby preventing catastrophic forgetting. This method seamlessly transforms the parser-based inpainting synthesis paradigm into a parser-free synthesis paradigm and enables efficient convergence of the diffusion model with only fine-tuning. Extensive experiments demonstrate the generality of our approach and highlight its superiority over previous methods.

查看原文本刊更多论文

GLV：用于潜在扩散增强的无解析器虚拟试戴的几何相关蒸馏

将知识蒸馏应用到虚拟试训任务中是一个具有挑战性的问题，因为现有的方法不能充分有效地利用责任教师的知识。也就是说，现有的方法只是通过教师模型生成的伪标签将先验知识转移到学生模型中，导致知识表示较浅，训练效率较低。为了解决这些限制，我们提出了一种新的师生架构，用于无解析器的虚拟试戴，称为GLV，它可以生成具有逼真身体细节的高质量试戴结果。具体来说，我们提出了一种与变形相关的先验蒸馏方法，以有效地利用教师变形模型中包含的有价值的变形信息。这提高了学生翘曲模型的收敛效率，防止其陷入局部极小值。此外，我们是第一个提出几何相关蒸馏，它模拟了服装和人之间潜在的几何关系，并将这种关系从教师转移到学生身上。这使得学生翘曲模型能够减少与变形无关的特征（如颜色和纹理）的纠缠。最后，我们提出了一种用于试穿结果合成的衣服-身体修饰方法，该方法在训练良好的扩散模型的潜在空间中细化去噪过程，从而防止灾难性遗忘。该方法将基于解析器的喷漆综合范式无缝地转换为无解析器的综合范式，仅通过微调即可实现扩散模型的有效收敛。大量的实验证明了我们的方法的通用性，并突出了它比以前的方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.