概念指挥:在文本到图像的合成中协调多个个性化概念

Zebin Yao, Fangxiang Feng, Ruifan Li, Xiaojie Wang
{"title":"概念指挥:在文本到图像的合成中协调多个个性化概念","authors":"Zebin Yao, Fangxiang Feng, Ruifan Li, Xiaojie Wang","doi":"arxiv-2408.03632","DOIUrl":null,"url":null,"abstract":"The customization of text-to-image models has seen significant advancements,\nyet generating multiple personalized concepts remains a challenging task.\nCurrent methods struggle with attribute leakage and layout confusion when\nhandling multiple concepts, leading to reduced concept fidelity and semantic\nconsistency. In this work, we introduce a novel training-free framework,\nConcept Conductor, designed to ensure visual fidelity and correct layout in\nmulti-concept customization. Concept Conductor isolates the sampling processes\nof multiple custom models to prevent attribute leakage between different\nconcepts and corrects erroneous layouts through self-attention-based spatial\nguidance. Additionally, we present a concept injection technique that employs\nshape-aware masks to specify the generation area for each concept. This\ntechnique injects the structure and appearance of personalized concepts through\nfeature fusion in the attention layers, ensuring harmony in the final image.\nExtensive qualitative and quantitative experiments demonstrate that Concept\nConductor can consistently generate composite images with accurate layouts\nwhile preserving the visual details of each concept. Compared to existing\nbaselines, Concept Conductor shows significant performance improvements. Our\nmethod supports the combination of any number of concepts and maintains high\nfidelity even when dealing with visually similar concepts. The code and models\nare available at https://github.com/Nihukat/Concept-Conductor.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis\",\"authors\":\"Zebin Yao, Fangxiang Feng, Ruifan Li, Xiaojie Wang\",\"doi\":\"arxiv-2408.03632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The customization of text-to-image models has seen significant advancements,\\nyet generating multiple personalized concepts remains a challenging task.\\nCurrent methods struggle with attribute leakage and layout confusion when\\nhandling multiple concepts, leading to reduced concept fidelity and semantic\\nconsistency. In this work, we introduce a novel training-free framework,\\nConcept Conductor, designed to ensure visual fidelity and correct layout in\\nmulti-concept customization. Concept Conductor isolates the sampling processes\\nof multiple custom models to prevent attribute leakage between different\\nconcepts and corrects erroneous layouts through self-attention-based spatial\\nguidance. Additionally, we present a concept injection technique that employs\\nshape-aware masks to specify the generation area for each concept. This\\ntechnique injects the structure and appearance of personalized concepts through\\nfeature fusion in the attention layers, ensuring harmony in the final image.\\nExtensive qualitative and quantitative experiments demonstrate that Concept\\nConductor can consistently generate composite images with accurate layouts\\nwhile preserving the visual details of each concept. Compared to existing\\nbaselines, Concept Conductor shows significant performance improvements. Our\\nmethod supports the combination of any number of concepts and maintains high\\nfidelity even when dealing with visually similar concepts. The code and models\\nare available at https://github.com/Nihukat/Concept-Conductor.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.03632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

当前的方法在处理多个概念时会出现属性泄露和布局混乱的问题,导致概念保真度和语义一致性降低。在这项工作中,我们引入了一种新型免训练框架--概念指挥器,旨在确保多概念定制过程中的视觉保真度和布局正确性。Concept Conductor 隔离了多个自定义模型的采样过程,以防止不同概念之间的属性泄漏,并通过基于自我注意力的空间引导纠正错误布局。此外,我们还提出了一种概念注入技术,它采用形状感知掩码来指定每个概念的生成区域。广泛的定性和定量实验证明,ConceptConductor 可以持续生成具有准确布局的合成图像,同时保留每个概念的视觉细节。与现有基线相比,概念引导器的性能有了显著提高。我们的方法支持任意数量概念的组合,即使在处理视觉上相似的概念时也能保持高保真。代码和模型可在 https://github.com/Nihukat/Concept-Conductor 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis
The customization of text-to-image models has seen significant advancements, yet generating multiple personalized concepts remains a challenging task. Current methods struggle with attribute leakage and layout confusion when handling multiple concepts, leading to reduced concept fidelity and semantic consistency. In this work, we introduce a novel training-free framework, Concept Conductor, designed to ensure visual fidelity and correct layout in multi-concept customization. Concept Conductor isolates the sampling processes of multiple custom models to prevent attribute leakage between different concepts and corrects erroneous layouts through self-attention-based spatial guidance. Additionally, we present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. This technique injects the structure and appearance of personalized concepts through feature fusion in the attention layers, ensuring harmony in the final image. Extensive qualitative and quantitative experiments demonstrate that Concept Conductor can consistently generate composite images with accurate layouts while preserving the visual details of each concept. Compared to existing baselines, Concept Conductor shows significant performance improvements. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts. The code and models are available at https://github.com/Nihukat/Concept-Conductor.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信