Jiang Xin, Xiaonan Fang, Xueling Zhu, Ju Ren, Yaoxue Zhang
{"title":"$C^{2}D$:面向个性化文本到图像合成的上下文感知概念分解。","authors":"Jiang Xin, Xiaonan Fang, Xueling Zhu, Ju Ren, Yaoxue Zhang","doi":"10.1109/TVCG.2025.3579776","DOIUrl":null,"url":null,"abstract":"<p><p>Concept decomposition is a technique for personalized text-to-image synthesis which learns textual embeddings of subconcepts from images that depicting an original concept. The learned subconcepts can then be composed to create new images. However, existing methods fail to address the issue of contextual conflicts when subconcepts from different sources are combined because contextual information remains encapsulated within the subconcept embeddings. To tackle this problem, we propose a Context-aware Concept Decomposition ($C^{2}D$) framework. Specifically, we introduce a Similarity-Guided Divergent Embedding (SGDE) method to obtain subconcept embeddings. Then, we eliminate the latent contextual dependence between the subconcept embeddings and reconstruct the contextual information using an independent contextual embedding. This independent context can be combined with various subconcepts, enabling more controllable text-to-image synthesis based on subconcept recombination. Extensive experimental results demonstrate that our method outperforms existing approaches in both image quality and contextual consistency.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"$C^{2}D$: Context-aware Concept Decomposition for Personalized Text-to-image Synthesis.\",\"authors\":\"Jiang Xin, Xiaonan Fang, Xueling Zhu, Ju Ren, Yaoxue Zhang\",\"doi\":\"10.1109/TVCG.2025.3579776\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Concept decomposition is a technique for personalized text-to-image synthesis which learns textual embeddings of subconcepts from images that depicting an original concept. The learned subconcepts can then be composed to create new images. However, existing methods fail to address the issue of contextual conflicts when subconcepts from different sources are combined because contextual information remains encapsulated within the subconcept embeddings. To tackle this problem, we propose a Context-aware Concept Decomposition ($C^{2}D$) framework. Specifically, we introduce a Similarity-Guided Divergent Embedding (SGDE) method to obtain subconcept embeddings. Then, we eliminate the latent contextual dependence between the subconcept embeddings and reconstruct the contextual information using an independent contextual embedding. This independent context can be combined with various subconcepts, enabling more controllable text-to-image synthesis based on subconcept recombination. Extensive experimental results demonstrate that our method outperforms existing approaches in both image quality and contextual consistency.</p>\",\"PeriodicalId\":94035,\"journal\":{\"name\":\"IEEE transactions on visualization and computer graphics\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on visualization and computer graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TVCG.2025.3579776\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TVCG.2025.3579776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
$C^{2}D$: Context-aware Concept Decomposition for Personalized Text-to-image Synthesis.
Concept decomposition is a technique for personalized text-to-image synthesis which learns textual embeddings of subconcepts from images that depicting an original concept. The learned subconcepts can then be composed to create new images. However, existing methods fail to address the issue of contextual conflicts when subconcepts from different sources are combined because contextual information remains encapsulated within the subconcept embeddings. To tackle this problem, we propose a Context-aware Concept Decomposition ($C^{2}D$) framework. Specifically, we introduce a Similarity-Guided Divergent Embedding (SGDE) method to obtain subconcept embeddings. Then, we eliminate the latent contextual dependence between the subconcept embeddings and reconstruct the contextual information using an independent contextual embedding. This independent context can be combined with various subconcepts, enabling more controllable text-to-image synthesis based on subconcept recombination. Extensive experimental results demonstrate that our method outperforms existing approaches in both image quality and contextual consistency.