EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-07-25 DOI:10.1016/j.cviu.2025.104453

Detian Huang , Miaohua Ruan , Yaohui Guo , Zhenzhen Hu , Huanqiang Zeng

{"title":"EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution","authors":"Detian Huang , Miaohua Ruan , Yaohui Guo , Zhenzhen Hu , Huanqiang Zeng","doi":"10.1016/j.cviu.2025.104453","DOIUrl":null,"url":null,"abstract":"<div><div>Diffusion Models (DMs) have achieved promising success in Real-world Image Super-Resolution (Real-ISR), where they reconstruct High-Resolution (HR) images from available Low-Resolution (LR) counterparts with unknown degradation by leveraging pre-trained Text-to-Image (T2I) diffusion models. However, due to the randomness nature of DMs and the severe degradation commonly presented in LR images, most DMs-based Real-ISR methods neglect the structure-level and semantic information, which results in reconstructed HR images suffering not only from important edge missing, but also from undesired regional information confusion. To tackle these challenges, we propose an Enhancing Prior-guided Diffusion model (EPDiff) for Real-ISR, which leverages high-frequency priors and semantic guidance to generate reconstructed images with realistic details. Firstly, we design a Guide Adapter (GA) module that extracts latent texture and edge features from LR images to provide high-frequency priors. Subsequently, we introduce a Semantic Prompt Extractor (SPE) that generates high-quality semantic prompts to enhance image understanding. Additionally, we build a Feature Rectify ControlNet (FRControlNet) to refine feature modulation, enabling realistic detail generation. Extensive experiments demonstrate that the proposed EPDiff outperforms state-of-the-art methods on both synthetic and real-world datasets.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104453"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001766","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Diffusion Models (DMs) have achieved promising success in Real-world Image Super-Resolution (Real-ISR), where they reconstruct High-Resolution (HR) images from available Low-Resolution (LR) counterparts with unknown degradation by leveraging pre-trained Text-to-Image (T2I) diffusion models. However, due to the randomness nature of DMs and the severe degradation commonly presented in LR images, most DMs-based Real-ISR methods neglect the structure-level and semantic information, which results in reconstructed HR images suffering not only from important edge missing, but also from undesired regional information confusion. To tackle these challenges, we propose an Enhancing Prior-guided Diffusion model (EPDiff) for Real-ISR, which leverages high-frequency priors and semantic guidance to generate reconstructed images with realistic details. Firstly, we design a Guide Adapter (GA) module that extracts latent texture and edge features from LR images to provide high-frequency priors. Subsequently, we introduce a Semantic Prompt Extractor (SPE) that generates high-quality semantic prompts to enhance image understanding. Additionally, we build a Feature Rectify ControlNet (FRControlNet) to refine feature modulation, enabling realistic detail generation. Extensive experiments demonstrate that the proposed EPDiff outperforms state-of-the-art methods on both synthetic and real-world datasets.

查看原文本刊更多论文

EPDiff：增强真实世界图像超分辨率的先验引导扩散模型

扩散模型（dm）在现实世界图像超分辨率（Real-ISR）中取得了有希望的成功，通过利用预训练的文本到图像（T2I）扩散模型，它们从已知退化的低分辨率（LR）图像中重建高分辨率（HR）图像。然而，由于dm的随机性和LR图像普遍存在的严重退化，大多数基于ms的Real-ISR方法忽略了结构级和语义信息，这导致重建的HR图像不仅存在重要边缘缺失，而且存在不希望的区域信息混乱。为了解决这些挑战，我们提出了一种用于Real-ISR的增强先验引导扩散模型（EPDiff），该模型利用高频先验和语义引导来生成具有逼真细节的重建图像。首先，我们设计了一个引导适配器（GA）模块，从LR图像中提取潜在纹理和边缘特征，以提供高频先验。随后，我们引入了一个语义提示提取器（SPE），它生成高质量的语义提示以增强图像理解。此外，我们建立了一个特征校正控制网（FRControlNet）来改进特征调制，使逼真的细节生成成为可能。大量的实验表明，所提出的EPDiff在合成数据集和实际数据集上都优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems