{"title":"EPDiff: Enhancing Prior-guided Diffusion model for Real-world Image Super-Resolution","authors":"Detian Huang , Miaohua Ruan , Yaohui Guo , Zhenzhen Hu , Huanqiang Zeng","doi":"10.1016/j.cviu.2025.104453","DOIUrl":null,"url":null,"abstract":"<div><div>Diffusion Models (DMs) have achieved promising success in Real-world Image Super-Resolution (Real-ISR), where they reconstruct High-Resolution (HR) images from available Low-Resolution (LR) counterparts with unknown degradation by leveraging pre-trained Text-to-Image (T2I) diffusion models. However, due to the randomness nature of DMs and the severe degradation commonly presented in LR images, most DMs-based Real-ISR methods neglect the structure-level and semantic information, which results in reconstructed HR images suffering not only from important edge missing, but also from undesired regional information confusion. To tackle these challenges, we propose an Enhancing Prior-guided Diffusion model (EPDiff) for Real-ISR, which leverages high-frequency priors and semantic guidance to generate reconstructed images with realistic details. Firstly, we design a Guide Adapter (GA) module that extracts latent texture and edge features from LR images to provide high-frequency priors. Subsequently, we introduce a Semantic Prompt Extractor (SPE) that generates high-quality semantic prompts to enhance image understanding. Additionally, we build a Feature Rectify ControlNet (FRControlNet) to refine feature modulation, enabling realistic detail generation. Extensive experiments demonstrate that the proposed EPDiff outperforms state-of-the-art methods on both synthetic and real-world datasets.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"259 ","pages":"Article 104453"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001766","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Diffusion Models (DMs) have achieved promising success in Real-world Image Super-Resolution (Real-ISR), where they reconstruct High-Resolution (HR) images from available Low-Resolution (LR) counterparts with unknown degradation by leveraging pre-trained Text-to-Image (T2I) diffusion models. However, due to the randomness nature of DMs and the severe degradation commonly presented in LR images, most DMs-based Real-ISR methods neglect the structure-level and semantic information, which results in reconstructed HR images suffering not only from important edge missing, but also from undesired regional information confusion. To tackle these challenges, we propose an Enhancing Prior-guided Diffusion model (EPDiff) for Real-ISR, which leverages high-frequency priors and semantic guidance to generate reconstructed images with realistic details. Firstly, we design a Guide Adapter (GA) module that extracts latent texture and edge features from LR images to provide high-frequency priors. Subsequently, we introduce a Semantic Prompt Extractor (SPE) that generates high-quality semantic prompts to enhance image understanding. Additionally, we build a Feature Rectify ControlNet (FRControlNet) to refine feature modulation, enabling realistic detail generation. Extensive experiments demonstrate that the proposed EPDiff outperforms state-of-the-art methods on both synthetic and real-world datasets.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems