Sketch-guided scene image generation with diffusion model

IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Tianyu Zhang, Xiaoxuan Xie, Xusheng Du, Haoran Xie
{"title":"Sketch-guided scene image generation with diffusion model","authors":"Tianyu Zhang,&nbsp;Xiaoxuan Xie,&nbsp;Xusheng Du,&nbsp;Haoran Xie","doi":"10.1016/j.cag.2025.104226","DOIUrl":null,"url":null,"abstract":"<div><div>Text-to-image models showcase the impressive ability to generate high-quality and diverse images. However, the transition from freehand sketches to complex scene images with multiple objects remains challenging in computer graphics. In this study, we propose a novel sketch-guided scene image generation framework, decomposing the task of scene image generation from sketch inputs into object-level cross-domain generation and scene-level image construction steps. We first employ a pre-trained diffusion model to convert each single object drawing into a separate image, which can infer additional image details while maintaining the sparse sketch structure. To preserve the conceptual fidelity of the foreground during scene generation, we invert the visual features of object images into identity embeddings for scene generation. For scene-level image construction, we generate the latent representation of the scene image using the separated background prompts. Then, we blend the generated foreground objects with the background image guided by the layout of sketch inputs. We infer the scene image on the blended latent representation using a global prompt with the trained identity tokens to ensure the foreground objects’ details remain unchanged while naturally composing the scene image. Through qualitative and quantitative experiments, we demonstrated that the proposed method’s ability surpasses the state-of-the-art approaches for scene image generation from hand-drawn sketches.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"129 ","pages":"Article 104226"},"PeriodicalIF":2.5000,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849325000676","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Text-to-image models showcase the impressive ability to generate high-quality and diverse images. However, the transition from freehand sketches to complex scene images with multiple objects remains challenging in computer graphics. In this study, we propose a novel sketch-guided scene image generation framework, decomposing the task of scene image generation from sketch inputs into object-level cross-domain generation and scene-level image construction steps. We first employ a pre-trained diffusion model to convert each single object drawing into a separate image, which can infer additional image details while maintaining the sparse sketch structure. To preserve the conceptual fidelity of the foreground during scene generation, we invert the visual features of object images into identity embeddings for scene generation. For scene-level image construction, we generate the latent representation of the scene image using the separated background prompts. Then, we blend the generated foreground objects with the background image guided by the layout of sketch inputs. We infer the scene image on the blended latent representation using a global prompt with the trained identity tokens to ensure the foreground objects’ details remain unchanged while naturally composing the scene image. Through qualitative and quantitative experiments, we demonstrated that the proposed method’s ability surpasses the state-of-the-art approaches for scene image generation from hand-drawn sketches.
用扩散模型生成草图引导的场景图像
文本到图像模型展示了令人印象深刻的生成高质量和多样化图像的能力。然而,从手绘草图到具有多个对象的复杂场景图像的过渡在计算机图形学中仍然具有挑战性。在这项研究中,我们提出了一种新的素描引导场景图像生成框架,将从草图输入到场景图像生成的任务分解为对象级跨域生成和场景级图像构建两个步骤。我们首先使用预训练的扩散模型将每个单独的物体图像转换为单独的图像,该模型可以在保持稀疏草图结构的同时推断出额外的图像细节。为了在场景生成过程中保持前景的概念保真度,我们将物体图像的视觉特征转化为场景生成的身份嵌入。对于场景级图像构建,我们使用分离的背景提示生成场景图像的潜在表示。然后,根据草图输入的布局,将生成的前景对象与背景图像进行融合。我们使用全局提示和训练好的身份令牌在混合潜在表示上推断场景图像,以确保前景物体的细节保持不变,同时自然地构成场景图像。通过定性和定量实验,我们证明了该方法的能力超过了从手绘草图生成场景图像的最先进方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Graphics-Uk
Computers & Graphics-Uk 工程技术-计算机:软件工程
CiteScore
5.30
自引率
12.00%
发文量
173
审稿时长
38 days
期刊介绍: Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信