Memory-efficient filter-guided diffusion with domain transform filtering

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk Pub Date : 2025-09-08 DOI:10.1016/j.cag.2025.104389

Gustavo Lopes Tamiosso, Caetano Müller, Lucas Spagnolo Bombana, Manuel M. Oliveira

{"title":"Memory-efficient filter-guided diffusion with domain transform filtering","authors":"Gustavo Lopes Tamiosso, Caetano Müller, Lucas Spagnolo Bombana, Manuel M. Oliveira","doi":"10.1016/j.cag.2025.104389","DOIUrl":null,"url":null,"abstract":"<div><div>Diffusion models are powerful tools for image synthesis and editing, yet preserving structural content from a guidance image remains challenging. Filter-Guided Diffusion (FGD) tackles this by applying edge-preserving filtering at each denoising step. However, the original FGD relies on joint bilateral filtering, which incurs high VRAM and computational costs, limiting its scalability to high-resolution images. We propose <strong>Domain Transform Filter-Guided Diffusion (DT-FGD)</strong>, a lightweight variant that replaces bilateral filtering with the efficient domain transform filter and introduces a normalization strategy for the guidance image’s latent representation. DT-FGD achieves significantly lower VRAM usage and faster inference while improving structural consistency. Our method produces images that better align with the text prompt and vary smoothly under filter parameter changes, leading to more predictable outcomes. Experiments show that DT-FGD can reduce VRAM consumption by over 50%, accelerates inference, and scales to high resolutions on a single GPU—unlike prior approaches. We further present a variant that offers even greater memory savings at the cost of additional inference time. DT-FGD enables structure-preserving diffusion on resource-constrained hardware and opens new directions for high-resolution, controllable image synthesis.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"132 ","pages":"Article 104389"},"PeriodicalIF":2.8000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Graphics-Uk","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0097849325002304","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Diffusion models are powerful tools for image synthesis and editing, yet preserving structural content from a guidance image remains challenging. Filter-Guided Diffusion (FGD) tackles this by applying edge-preserving filtering at each denoising step. However, the original FGD relies on joint bilateral filtering, which incurs high VRAM and computational costs, limiting its scalability to high-resolution images. We propose Domain Transform Filter-Guided Diffusion (DT-FGD), a lightweight variant that replaces bilateral filtering with the efficient domain transform filter and introduces a normalization strategy for the guidance image’s latent representation. DT-FGD achieves significantly lower VRAM usage and faster inference while improving structural consistency. Our method produces images that better align with the text prompt and vary smoothly under filter parameter changes, leading to more predictable outcomes. Experiments show that DT-FGD can reduce VRAM consumption by over 50%, accelerates inference, and scales to high resolutions on a single GPU—unlike prior approaches. We further present a variant that offers even greater memory savings at the cost of additional inference time. DT-FGD enables structure-preserving diffusion on resource-constrained hardware and opens new directions for high-resolution, controllable image synthesis.

Abstract Image

查看原文本刊更多论文

具有域变换滤波的内存高效滤波引导扩散

扩散模型是图像合成和编辑的强大工具，但从引导图像中保留结构内容仍然具有挑战性。滤波引导扩散（FGD）通过在每个去噪步骤应用边缘保持滤波来解决这个问题。然而，原始的FGD依赖于联合双边滤波，这导致了较高的VRAM和计算成本，限制了其在高分辨率图像上的可扩展性。我们提出了域变换滤波引导扩散（DT-FGD），这是一种轻量级的变体，它用高效的域变换滤波代替双边滤波，并引入了一种用于制导图像潜在表示的归一化策略。DT-FGD在提高结构一致性的同时实现了更低的VRAM使用和更快的推理。我们的方法生成的图像更好地与文本提示对齐，并且在过滤器参数变化下平滑变化，从而产生更可预测的结果。实验表明，与之前的方法不同，DT-FGD可以减少50%以上的VRAM消耗，加速推理，并在单个gpu上扩展到高分辨率。我们进一步提出了一种变体，它以额外的推理时间为代价，提供了更大的内存节省。DT-FGD实现了在资源受限的硬件上保持结构的扩散，为高分辨率、可控的图像合成开辟了新的方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Graphics-Uk 工程技术-计算机：软件工程

CiteScore

5.30

自引率

12.00%

发文量

173

审稿时长

38 days

期刊介绍： Computers & Graphics is dedicated to disseminate information on research and applications of computer graphics (CG) techniques. The journal encourages articles on: 1. Research and applications of interactive computer graphics. We are particularly interested in novel interaction techniques and applications of CG to problem domains. 2. State-of-the-art papers on late-breaking, cutting-edge research on CG. 3. Information on innovative uses of graphics principles and technologies. 4. Tutorial papers on both teaching CG principles and innovative uses of CG in education.