Scribble-Guided Diffusion for Training-free Text-to-Image Generation

arXiv - CS - Computer Vision and Pattern Recognition Pub Date : 2024-09-12 DOI:arxiv-2409.08026

Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim

{"title":"Scribble-Guided Diffusion for Training-free Text-to-Image Generation","authors":"Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim","doi":"arxiv-2409.08026","DOIUrl":null,"url":null,"abstract":"Recent advancements in text-to-image diffusion models have demonstrated\nremarkable success, yet they often struggle to fully capture the user's intent.\nExisting approaches using textual inputs combined with bounding boxes or region\nmasks fall short in providing precise spatial guidance, often leading to\nmisaligned or unintended object orientation. To address these limitations, we\npropose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that\nutilizes simple user-provided scribbles as visual prompts to guide image\ngeneration. However, incorporating scribbles into diffusion models presents\nchallenges due to their sparse and thin nature, making it difficult to ensure\naccurate orientation alignment. To overcome these challenges, we introduce\nmoment alignment and scribble propagation, which allow for more effective and\nflexible alignment between generated images and scribble inputs. Experimental\nresults on the PASCAL-Scribble dataset demonstrate significant improvements in\nspatial control and consistency, showcasing the effectiveness of scribble-based\nguidance in diffusion models. Our code is available at\nhttps://github.com/kaist-cvml-lab/scribble-diffusion.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent. Existing approaches using textual inputs combined with bounding boxes or region masks fall short in providing precise spatial guidance, often leading to misaligned or unintended object orientation. To address these limitations, we propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that utilizes simple user-provided scribbles as visual prompts to guide image generation. However, incorporating scribbles into diffusion models presents challenges due to their sparse and thin nature, making it difficult to ensure accurate orientation alignment. To overcome these challenges, we introduce moment alignment and scribble propagation, which allow for more effective and flexible alignment between generated images and scribble inputs. Experimental results on the PASCAL-Scribble dataset demonstrate significant improvements in spatial control and consistency, showcasing the effectiveness of scribble-based guidance in diffusion models. Our code is available at https://github.com/kaist-cvml-lab/scribble-diffusion.

查看原文本刊更多论文

用于免训练文本到图像生成的涂鸦引导扩散技术

文本到图像扩散模型的最新进展已经取得了显著的成功，但它们往往难以完全捕捉用户的意图。现有的方法使用文本输入与边界框或区域掩码相结合，在提供精确的空间引导方面存在不足，经常会导致对象定位不准或意外定位。为了解决这些局限性，我们提出了涂鸦引导扩散（ScribbleDiff），这是一种无需训练的方法，利用用户提供的简单涂鸦作为视觉提示来引导图像生成。然而，将涂鸦纳入扩散模型会面临挑战，因为涂鸦稀疏且薄，很难确保方向对齐的准确性。为了克服这些挑战，我们引入了瞬间配准和涂鸦传播，从而在生成的图像和涂鸦输入之间实现更有效、更灵活的配准。在PASCAL-Scribble数据集上的实验结果表明，空间控制和一致性有了显著改善，展示了基于scribble的导航在扩散模型中的有效性。我们的代码可在https://github.com/kaist-cvml-lab/scribble-diffusion。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Computer Vision and Pattern Recognition

自引率

0.00%

发文量