{"title":"Slot-VTON:基于主体驱动的扩散式虚拟试戴与插槽注意力","authors":"Jianglei Ye, Yigang Wang, Fengmao Xie, Qin Wang, Xiaoling Gu, Zizhao Wu","doi":"10.1007/s00371-024-03603-z","DOIUrl":null,"url":null,"abstract":"<p>Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention\",\"authors\":\"Jianglei Ye, Yigang Wang, Fengmao Xie, Qin Wang, Xiaoling Gu, Zizhao Wu\",\"doi\":\"10.1007/s00371-024-03603-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03603-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03603-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Slot-VTON: subject-driven diffusion-based virtual try-on with slot attention
Virtual try-on aims to transfer clothes from one image to another while preserving intricate wearer and clothing details. Tremendous efforts have been made to facilitate the task based on deep generative models such as GAN and diffusion models; however, the current methods have not taken into account the influence of the natural environment (background and unrelated impurities) on clothing image, leading to issues such as loss of detail, intricate textures, shadows, and folds. In this paper, we introduce Slot-VTON, a slot attention-based inpainting approach for seamless image generation in a subject-driven way. Specifically, we adopt an attention mechanism, termed slot attention, that can unsupervisedly separate the various subjects within images. With slot attention, we distill the clothing image into a series of slot representations, where each slot represents a subject. Guided by the extracted clothing slot, our method is capable of eliminating the interference of other unnecessary factors, thereby better preserving the complex details of the clothing. To further enhance the seamless generation of the diffusion model, we design a fusion adapter that integrates multiple conditions, including the slot and other added clothing conditions. In addition, a non-garment inpainting module is used to further fix visible seams and preserve non-clothing area details (hands, neck, etc.). Multiple experiments on VITON-HD datasets validate the efficacy of our methods, showcasing state-of-the-art generation performances. Our implementation is available at: https://github.com/SilverLakee/Slot-VTON.