ctd -绘图：文本驱动的混合扩散绘图的连贯性

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-04-22 DOI:10.1016/j.inffus.2025.103163

Yan Zhong , Xinping Zhao , Guangzhi Zhao , Bohua Chen , Fei Hao , Ruoyu Zhao , Jiaqi He , Lei Shi , Li Zhang

{"title":"ctd -绘图：文本驱动的混合扩散绘图的连贯性","authors":"Yan Zhong , Xinping Zhao , Guangzhi Zhao , Bohua Chen , Fei Hao , Ruoyu Zhao , Jiaqi He , Lei Shi , Li Zhang","doi":"10.1016/j.inffus.2025.103163","DOIUrl":null,"url":null,"abstract":"<div><div>Text-driven inpainting has emerged as a prominent and challenging research topic in image completion recently, where denoising diffusion probabilistic models (DDPM)-based approaches have achieved state-of-the-art performance on authentic and diverse images. However, ensuring high image fidelity during generation remains a critical aspect in effective text-driven inpainting. Moreover, guaranteeing coherence between the unmasked region (background) and the generated results in the masked regions poses a significant challenge in measurement and implementation. To address these issues, we propose CTD-Inpainting, a novel text-driven inpainting framework, incorporates a coherence constraint between the masked and unmasked regions. Specifically, CTD-Inpainting employs a pre-trained contrastive language-image model (CLIP) to guide DDPM-based generation, aligning it with the text prompt. Additionally, we introduce a transition region between the background and the masked region via mask expansion. This transition region helps maintain coherence between the foreground and background by ensuring consistency between the generated results and the original background during inpainting. At each denoising step, we employ a blending technique, where multiple noise-injected versions of the input image are harmonized with the latent diffusion guided by text and coherence constraint in the transition region. This enables seamless integration of conditional information with the generated information via resampling. Additionally, we design an innovative coherence metric based on the coherence constraint, providing a quantitative measure for the subjective coherence assessment. Extensive experiments manifest the superiority of CTD-Inpainting against state-of-the-art methods on real-world and diverse images.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"122 ","pages":"Article 103163"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CTD-inpainting: Towards the Coherence of Text-driven Inpainting with Blended Diffusion\",\"authors\":\"Yan Zhong , Xinping Zhao , Guangzhi Zhao , Bohua Chen , Fei Hao , Ruoyu Zhao , Jiaqi He , Lei Shi , Li Zhang\",\"doi\":\"10.1016/j.inffus.2025.103163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Text-driven inpainting has emerged as a prominent and challenging research topic in image completion recently, where denoising diffusion probabilistic models (DDPM)-based approaches have achieved state-of-the-art performance on authentic and diverse images. However, ensuring high image fidelity during generation remains a critical aspect in effective text-driven inpainting. Moreover, guaranteeing coherence between the unmasked region (background) and the generated results in the masked regions poses a significant challenge in measurement and implementation. To address these issues, we propose CTD-Inpainting, a novel text-driven inpainting framework, incorporates a coherence constraint between the masked and unmasked regions. Specifically, CTD-Inpainting employs a pre-trained contrastive language-image model (CLIP) to guide DDPM-based generation, aligning it with the text prompt. Additionally, we introduce a transition region between the background and the masked region via mask expansion. This transition region helps maintain coherence between the foreground and background by ensuring consistency between the generated results and the original background during inpainting. At each denoising step, we employ a blending technique, where multiple noise-injected versions of the input image are harmonized with the latent diffusion guided by text and coherence constraint in the transition region. This enables seamless integration of conditional information with the generated information via resampling. Additionally, we design an innovative coherence metric based on the coherence constraint, providing a quantitative measure for the subjective coherence assessment. Extensive experiments manifest the superiority of CTD-Inpainting against state-of-the-art methods on real-world and diverse images.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"122 \",\"pages\":\"Article 103163\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525002362\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002362","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

文本驱动的图像补全已成为图像补全领域一个突出且具有挑战性的研究课题，其中基于去噪扩散概率模型（DDPM）的方法已经在真实和多样化的图像上取得了最先进的性能。然而，在生成过程中确保高图像保真度仍然是有效的文本驱动绘画的关键方面。此外，如何保证未遮挡区域（背景）与被遮挡区域生成的结果之间的一致性，对测量和实现提出了重大挑战。为了解决这些问题，我们提出了一种新的文本驱动的图像绘制框架CTD-Inpainting，该框架在被遮挡区域和未被遮挡区域之间结合了一致性约束。具体来说，CTD-Inpainting采用预先训练的对比语言图像模型（CLIP）来指导基于ddpm的生成，并将其与文本提示对齐。此外，我们通过蒙版扩展在背景和蒙版区域之间引入了一个过渡区域。这个过渡区域有助于保持前景和背景之间的一致性，确保生成的结果和原始背景在绘制过程中的一致性。在每个去噪步骤中，我们采用混合技术，其中输入图像的多个注入噪声的版本与过渡区域的文本和相干约束引导的潜在扩散相协调。这使得条件信息与通过重采样生成的信息能够无缝集成。此外，我们设计了一种基于相干约束的创新性相干度量，为主观相干性评价提供了一种定量度量。大量的实验表明，在现实世界和不同的图像上，CTD-Inpainting与最先进的方法相比具有优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CTD-inpainting: Towards the Coherence of Text-driven Inpainting with Blended Diffusion

Text-driven inpainting has emerged as a prominent and challenging research topic in image completion recently, where denoising diffusion probabilistic models (DDPM)-based approaches have achieved state-of-the-art performance on authentic and diverse images. However, ensuring high image fidelity during generation remains a critical aspect in effective text-driven inpainting. Moreover, guaranteeing coherence between the unmasked region (background) and the generated results in the masked regions poses a significant challenge in measurement and implementation. To address these issues, we propose CTD-Inpainting, a novel text-driven inpainting framework, incorporates a coherence constraint between the masked and unmasked regions. Specifically, CTD-Inpainting employs a pre-trained contrastive language-image model (CLIP) to guide DDPM-based generation, aligning it with the text prompt. Additionally, we introduce a transition region between the background and the masked region via mask expansion. This transition region helps maintain coherence between the foreground and background by ensuring consistency between the generated results and the original background during inpainting. At each denoising step, we employ a blending technique, where multiple noise-injected versions of the input image are harmonized with the latent diffusion guided by text and coherence constraint in the transition region. This enables seamless integration of conditional information with the generated information via resampling. Additionally, we design an innovative coherence metric based on the coherence constraint, providing a quantitative measure for the subjective coherence assessment. Extensive experiments manifest the superiority of CTD-Inpainting against state-of-the-art methods on real-world and diverse images.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.