{"title":"High-speed and precise virtual try-on with two-stage semantic segmentation and a latent consistency model for optimized diffusion processes","authors":"Sangyeop Baek, Jong Taek Lee","doi":"10.4218/etrij.2024-0592","DOIUrl":null,"url":null,"abstract":"<p>This work tests the hypothesis that the primary bottleneck for visual quality in virtual try-on (VTON) systems is the precision of input segmentation masks, rather than generative capability. VTON technology empowers users to dress digital models in desired clothing items virtually. Conventional VTON models rely on segmentation models to isolate clothing regions and diffusion models to synthesize complete VTON images. This paper introduces high-speed and precise VTON (HSP-VTON) as a framework that uniquely combines refined two-stage semantic segmentation for enhanced accuracy with a latent consistency model to accelerate the diffusion-based image generation process. The synergistic integration of these components for VTON addresses critical challenges in both precision and speed. Experimental results on the ATR dataset demonstrate a 2.8% improvement in mean intersection over union compared with existing methods. Furthermore, HSP-VTON achieves superior performance on the VITON-HD dataset, outperforming state-of-the-art VTON models. The latent consistency model also reduces the number of inference steps, leading to substantial time savings without compromising image quality.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"47 5","pages":"881-892"},"PeriodicalIF":1.6000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2024-0592","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETRI Journal","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2024-0592","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This work tests the hypothesis that the primary bottleneck for visual quality in virtual try-on (VTON) systems is the precision of input segmentation masks, rather than generative capability. VTON technology empowers users to dress digital models in desired clothing items virtually. Conventional VTON models rely on segmentation models to isolate clothing regions and diffusion models to synthesize complete VTON images. This paper introduces high-speed and precise VTON (HSP-VTON) as a framework that uniquely combines refined two-stage semantic segmentation for enhanced accuracy with a latent consistency model to accelerate the diffusion-based image generation process. The synergistic integration of these components for VTON addresses critical challenges in both precision and speed. Experimental results on the ATR dataset demonstrate a 2.8% improvement in mean intersection over union compared with existing methods. Furthermore, HSP-VTON achieves superior performance on the VITON-HD dataset, outperforming state-of-the-art VTON models. The latent consistency model also reduces the number of inference steps, leading to substantial time savings without compromising image quality.
期刊介绍:
ETRI Journal is an international, peer-reviewed multidisciplinary journal published bimonthly in English. The main focus of the journal is to provide an open forum to exchange innovative ideas and technology in the fields of information, telecommunications, and electronics.
Key topics of interest include high-performance computing, big data analytics, cloud computing, multimedia technology, communication networks and services, wireless communications and mobile computing, material and component technology, as well as security.
With an international editorial committee and experts from around the world as reviewers, ETRI Journal publishes high-quality research papers on the latest and best developments from the global community.