Zebin Yao, Fangxiang Feng, Ruifan Li, Xiaojie Wang
{"title":"概念指挥:在文本到图像的合成中协调多个个性化概念","authors":"Zebin Yao, Fangxiang Feng, Ruifan Li, Xiaojie Wang","doi":"arxiv-2408.03632","DOIUrl":null,"url":null,"abstract":"The customization of text-to-image models has seen significant advancements,\nyet generating multiple personalized concepts remains a challenging task.\nCurrent methods struggle with attribute leakage and layout confusion when\nhandling multiple concepts, leading to reduced concept fidelity and semantic\nconsistency. In this work, we introduce a novel training-free framework,\nConcept Conductor, designed to ensure visual fidelity and correct layout in\nmulti-concept customization. Concept Conductor isolates the sampling processes\nof multiple custom models to prevent attribute leakage between different\nconcepts and corrects erroneous layouts through self-attention-based spatial\nguidance. Additionally, we present a concept injection technique that employs\nshape-aware masks to specify the generation area for each concept. This\ntechnique injects the structure and appearance of personalized concepts through\nfeature fusion in the attention layers, ensuring harmony in the final image.\nExtensive qualitative and quantitative experiments demonstrate that Concept\nConductor can consistently generate composite images with accurate layouts\nwhile preserving the visual details of each concept. Compared to existing\nbaselines, Concept Conductor shows significant performance improvements. Our\nmethod supports the combination of any number of concepts and maintains high\nfidelity even when dealing with visually similar concepts. The code and models\nare available at https://github.com/Nihukat/Concept-Conductor.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"7 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis\",\"authors\":\"Zebin Yao, Fangxiang Feng, Ruifan Li, Xiaojie Wang\",\"doi\":\"arxiv-2408.03632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The customization of text-to-image models has seen significant advancements,\\nyet generating multiple personalized concepts remains a challenging task.\\nCurrent methods struggle with attribute leakage and layout confusion when\\nhandling multiple concepts, leading to reduced concept fidelity and semantic\\nconsistency. In this work, we introduce a novel training-free framework,\\nConcept Conductor, designed to ensure visual fidelity and correct layout in\\nmulti-concept customization. Concept Conductor isolates the sampling processes\\nof multiple custom models to prevent attribute leakage between different\\nconcepts and corrects erroneous layouts through self-attention-based spatial\\nguidance. Additionally, we present a concept injection technique that employs\\nshape-aware masks to specify the generation area for each concept. This\\ntechnique injects the structure and appearance of personalized concepts through\\nfeature fusion in the attention layers, ensuring harmony in the final image.\\nExtensive qualitative and quantitative experiments demonstrate that Concept\\nConductor can consistently generate composite images with accurate layouts\\nwhile preserving the visual details of each concept. Compared to existing\\nbaselines, Concept Conductor shows significant performance improvements. Our\\nmethod supports the combination of any number of concepts and maintains high\\nfidelity even when dealing with visually similar concepts. The code and models\\nare available at https://github.com/Nihukat/Concept-Conductor.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.03632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis
The customization of text-to-image models has seen significant advancements,
yet generating multiple personalized concepts remains a challenging task.
Current methods struggle with attribute leakage and layout confusion when
handling multiple concepts, leading to reduced concept fidelity and semantic
consistency. In this work, we introduce a novel training-free framework,
Concept Conductor, designed to ensure visual fidelity and correct layout in
multi-concept customization. Concept Conductor isolates the sampling processes
of multiple custom models to prevent attribute leakage between different
concepts and corrects erroneous layouts through self-attention-based spatial
guidance. Additionally, we present a concept injection technique that employs
shape-aware masks to specify the generation area for each concept. This
technique injects the structure and appearance of personalized concepts through
feature fusion in the attention layers, ensuring harmony in the final image.
Extensive qualitative and quantitative experiments demonstrate that Concept
Conductor can consistently generate composite images with accurate layouts
while preserving the visual details of each concept. Compared to existing
baselines, Concept Conductor shows significant performance improvements. Our
method supports the combination of any number of concepts and maintains high
fidelity even when dealing with visually similar concepts. The code and models
are available at https://github.com/Nihukat/Concept-Conductor.