Shitong Cao, Xuejie Zhang, Jin Wang, Xiaobing Zhou
{"title":"TopoDiff: Training-free image generation with topological layout control","authors":"Shitong Cao, Xuejie Zhang, Jin Wang, Xiaobing Zhou","doi":"10.1016/j.eswa.2025.128556","DOIUrl":null,"url":null,"abstract":"<div><div>Recent diffusion models can generate high-quality images from text, but their spatial control remains limited. To address this, the goal is to enhance layout control in text-to-image generation without requiring retraining of existing models. Specifically, the proposed TopoDiff framework is a training-free approach that leverages topological guidance to enable precise spatial control during inference. It leaves the original architecture and parameters of Stable Diffusion unmodified. This approach employs a graph-based topological language to explicitly capture object spatial relationships while integrating topological loss into the diffusion model’s denoising process. Additionally, a dynamic offset mechanism is designed to adjust spatial positions during generation, balancing topological structure consistency with the flexibility required for complex generation. Experimental results demonstrate that TopoDiff achieves over 10 % higher Average Precision (AP) than the Stable Diffusion. The source codes are publicly available at <span><span>https://github.com/marcocst/TopoDiff</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"291 ","pages":"Article 128556"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742502175X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent diffusion models can generate high-quality images from text, but their spatial control remains limited. To address this, the goal is to enhance layout control in text-to-image generation without requiring retraining of existing models. Specifically, the proposed TopoDiff framework is a training-free approach that leverages topological guidance to enable precise spatial control during inference. It leaves the original architecture and parameters of Stable Diffusion unmodified. This approach employs a graph-based topological language to explicitly capture object spatial relationships while integrating topological loss into the diffusion model’s denoising process. Additionally, a dynamic offset mechanism is designed to adjust spatial positions during generation, balancing topological structure consistency with the flexibility required for complex generation. Experimental results demonstrate that TopoDiff achieves over 10 % higher Average Precision (AP) than the Stable Diffusion. The source codes are publicly available at https://github.com/marcocst/TopoDiff.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.