Slot-VTON：基于主体驱动的扩散式虚拟试戴与插槽注意力

The Visual Computer Pub Date : 2024-08-23 DOI:10.1007/s00371-024-03603-z

Jianglei Ye, Yigang Wang, Fengmao Xie, Qin Wang, Xiaoling Gu, Zizhao Wu

{"title":"Slot-VTON：基于主体驱动的扩散式虚拟试戴与插槽注意力","authors":"Jianglei Ye, Yigang Wang, Fengmao Xie, Qin Wang, Xiaoling Gu, Zizhao Wu","doi":"10.1007/s00371-024-03603-z","DOIUrl":null,"url":null,"abstract":"<p>Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention\",\"authors\":\"Jianglei Ye, Yigang Wang, Fengmao Xie, Qin Wang, Xiaoling Gu, Zizhao Wu\",\"doi\":\"10.1007/s00371-024-03603-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03603-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03603-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

虚拟试穿旨在将服装从一张图像转移到另一张图像，同时保留复杂的穿着者和服装细节。然而，目前的方法没有考虑到自然环境（背景和无关杂质）对服装图像的影响，从而导致细节、复杂纹理、阴影和褶皱等问题的丢失。在本文中，我们介绍了 Slot-VTON，这是一种基于插槽注意力的 Inpainting 方法，以主体驱动的方式生成无缝图像。具体来说，我们采用了一种称为 "槽注意"（slot attention）的注意机制，它可以无监督地分离图像中的各种主体。通过插槽关注，我们将服装图像提炼为一系列插槽表示，其中每个插槽代表一个主体。在提取的服装槽的引导下，我们的方法能够排除其他不必要因素的干扰，从而更好地保留服装的复杂细节。为了进一步增强扩散模型的无缝生成，我们设计了一种融合适配器，它能整合多种条件，包括槽和其他附加的服装条件。此外，我们还使用了一个非服装涂色模块来进一步修复可见接缝并保留非服装区域的细节（手部、颈部等）。在 VITON-HD 数据集上进行的多次实验验证了我们方法的有效性，展示了最先进的生成性能。我们的实现方法可在以下网址获取：https://github.com/SilverLakee/Slot-VTON.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention

查看原文本刊更多论文

Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention

Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量