用于免训练文本到图像生成的涂鸦引导扩散技术

Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim
{"title":"用于免训练文本到图像生成的涂鸦引导扩散技术","authors":"Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim","doi":"arxiv-2409.08026","DOIUrl":null,"url":null,"abstract":"Recent advancements in text-to-image diffusion models have demonstrated\nremarkable success, yet they often struggle to fully capture the user's intent.\nExisting approaches using textual inputs combined with bounding boxes or region\nmasks fall short in providing precise spatial guidance, often leading to\nmisaligned or unintended object orientation. To address these limitations, we\npropose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that\nutilizes simple user-provided scribbles as visual prompts to guide image\ngeneration. However, incorporating scribbles into diffusion models presents\nchallenges due to their sparse and thin nature, making it difficult to ensure\naccurate orientation alignment. To overcome these challenges, we introduce\nmoment alignment and scribble propagation, which allow for more effective and\nflexible alignment between generated images and scribble inputs. Experimental\nresults on the PASCAL-Scribble dataset demonstrate significant improvements in\nspatial control and consistency, showcasing the effectiveness of scribble-based\nguidance in diffusion models. Our code is available at\nhttps://github.com/kaist-cvml-lab/scribble-diffusion.","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scribble-Guided Diffusion for Training-free Text-to-Image Generation\",\"authors\":\"Seonho Lee, Jiho Choi, Seohyun Lim, Jiwook Kim, Hyunjung Shim\",\"doi\":\"arxiv-2409.08026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in text-to-image diffusion models have demonstrated\\nremarkable success, yet they often struggle to fully capture the user's intent.\\nExisting approaches using textual inputs combined with bounding boxes or region\\nmasks fall short in providing precise spatial guidance, often leading to\\nmisaligned or unintended object orientation. To address these limitations, we\\npropose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that\\nutilizes simple user-provided scribbles as visual prompts to guide image\\ngeneration. However, incorporating scribbles into diffusion models presents\\nchallenges due to their sparse and thin nature, making it difficult to ensure\\naccurate orientation alignment. To overcome these challenges, we introduce\\nmoment alignment and scribble propagation, which allow for more effective and\\nflexible alignment between generated images and scribble inputs. Experimental\\nresults on the PASCAL-Scribble dataset demonstrate significant improvements in\\nspatial control and consistency, showcasing the effectiveness of scribble-based\\nguidance in diffusion models. Our code is available at\\nhttps://github.com/kaist-cvml-lab/scribble-diffusion.\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

文本到图像扩散模型的最新进展已经取得了显著的成功,但它们往往难以完全捕捉用户的意图。现有的方法使用文本输入与边界框或区域掩码相结合,在提供精确的空间引导方面存在不足,经常会导致对象定位不准或意外定位。为了解决这些局限性,我们提出了涂鸦引导扩散(ScribbleDiff),这是一种无需训练的方法,利用用户提供的简单涂鸦作为视觉提示来引导图像生成。然而,将涂鸦纳入扩散模型会面临挑战,因为涂鸦稀疏且薄,很难确保方向对齐的准确性。为了克服这些挑战,我们引入了瞬间配准和涂鸦传播,从而在生成的图像和涂鸦输入之间实现更有效、更灵活的配准。在PASCAL-Scribble数据集上的实验结果表明,空间控制和一致性有了显著改善,展示了基于scribble的导航在扩散模型中的有效性。我们的代码可在https://github.com/kaist-cvml-lab/scribble-diffusion。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scribble-Guided Diffusion for Training-free Text-to-Image Generation
Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent. Existing approaches using textual inputs combined with bounding boxes or region masks fall short in providing precise spatial guidance, often leading to misaligned or unintended object orientation. To address these limitations, we propose Scribble-Guided Diffusion (ScribbleDiff), a training-free approach that utilizes simple user-provided scribbles as visual prompts to guide image generation. However, incorporating scribbles into diffusion models presents challenges due to their sparse and thin nature, making it difficult to ensure accurate orientation alignment. To overcome these challenges, we introduce moment alignment and scribble propagation, which allow for more effective and flexible alignment between generated images and scribble inputs. Experimental results on the PASCAL-Scribble dataset demonstrate significant improvements in spatial control and consistency, showcasing the effectiveness of scribble-based guidance in diffusion models. Our code is available at https://github.com/kaist-cvml-lab/scribble-diffusion.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信