Diff-RCformer：一种用于图像超分辨率的扩散增强递归上下文转换器

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-05-28 DOI:10.1016/j.knosys.2025.113758

Shuo Wang , Shuzhen Xu , Cuicui Lv , Chaoqing Ma , Fangbo Cai

{"title":"Diff-RCformer：一种用于图像超分辨率的扩散增强递归上下文转换器","authors":"Shuo Wang , Shuzhen Xu , Cuicui Lv , Chaoqing Ma , Fangbo Cai","doi":"10.1016/j.knosys.2025.113758","DOIUrl":null,"url":null,"abstract":"<div><div>Diffusion models have recently exhibited strong potential in single-image super-resolution (SISR) by effectively modeling complex data distributions and generating high-quality reconstructions. However, existing diffusion-based SISR methods often suffer from excessive iterative steps, resulting in a high computational overhead and slow convergence. In addition, traditional convolutional neural networks and Transformer-based architectures have difficulty in capturing complex global information, thereby limiting the reconstruction quality. To address these issues, we propose Diff-RCformer, which is a novel SISR framework that integrates diffusion-based prior generation with the Recursive Context Transformer (RCformer) to achieve robust and efficient super-resolution. Specifically, we use the diffusion model to generate high-quality prior features for super-resolution by iteratively refining Gaussian noise in a compressed latent space. These prior features are then injected into the RCformer, guiding it to reconstruct the high-resolution image. In the RCformer, we introduce Prior-Guided Recursive Generalization Network (PG-RGN) blocks. These blocks recursively aggregate the input features into representative feature maps, enabling them to adapt flexibly to input features of different dimensions and extract global information through cross-attention. We also combine the PG-RGN with Prior-Guided Local Self-Attention (PG-LSA) to enable the model to capture local detail features accurately and enhance the utilization of the global context. To achieve an optimal combination of local and global features, we propose Adaptive Feature Integration (AFI), which efficiently fuses local and global features across multiple attention layers. Our method also supports cascaded super-resolution, enabling flexible multi-stage refinement, which is particularly useful for complex scenarios. Comprehensive experiments on standard benchmarks indicate that Diff-RCformer surpasses recent state-of-the-art methods both quantitatively and qualitatively. <span><span>https://github.com/SureT-T/Diff-RCformer</span><svg><path></path></svg></span></div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"323 ","pages":"Article 113758"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diff-RCformer: A diffusion-augmented Recursive Context Transformer for image super-resolution\",\"authors\":\"Shuo Wang , Shuzhen Xu , Cuicui Lv , Chaoqing Ma , Fangbo Cai\",\"doi\":\"10.1016/j.knosys.2025.113758\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Diffusion models have recently exhibited strong potential in single-image super-resolution (SISR) by effectively modeling complex data distributions and generating high-quality reconstructions. However, existing diffusion-based SISR methods often suffer from excessive iterative steps, resulting in a high computational overhead and slow convergence. In addition, traditional convolutional neural networks and Transformer-based architectures have difficulty in capturing complex global information, thereby limiting the reconstruction quality. To address these issues, we propose Diff-RCformer, which is a novel SISR framework that integrates diffusion-based prior generation with the Recursive Context Transformer (RCformer) to achieve robust and efficient super-resolution. Specifically, we use the diffusion model to generate high-quality prior features for super-resolution by iteratively refining Gaussian noise in a compressed latent space. These prior features are then injected into the RCformer, guiding it to reconstruct the high-resolution image. In the RCformer, we introduce Prior-Guided Recursive Generalization Network (PG-RGN) blocks. These blocks recursively aggregate the input features into representative feature maps, enabling them to adapt flexibly to input features of different dimensions and extract global information through cross-attention. We also combine the PG-RGN with Prior-Guided Local Self-Attention (PG-LSA) to enable the model to capture local detail features accurately and enhance the utilization of the global context. To achieve an optimal combination of local and global features, we propose Adaptive Feature Integration (AFI), which efficiently fuses local and global features across multiple attention layers. Our method also supports cascaded super-resolution, enabling flexible multi-stage refinement, which is particularly useful for complex scenarios. Comprehensive experiments on standard benchmarks indicate that Diff-RCformer surpasses recent state-of-the-art methods both quantitatively and qualitatively. <span><span>https://github.com/SureT-T/Diff-RCformer</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"323 \",\"pages\":\"Article 113758\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-05-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125008044\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125008044","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

扩散模型最近通过有效地模拟复杂数据分布和生成高质量的重建，在单图像超分辨率（SISR）中显示出强大的潜力。然而，现有的基于扩散的SISR方法往往存在迭代步骤过多的问题，导致计算开销高，收敛速度慢。此外，传统的卷积神经网络和基于transformer的架构难以捕获复杂的全局信息，从而限制了重建质量。为了解决这些问题，我们提出了Diff-RCformer，这是一种新的SISR框架，它将基于扩散的先验生成与递归上下文转换器（RCformer）集成在一起，以实现鲁棒和高效的超分辨率。具体来说，我们使用扩散模型通过迭代地细化压缩潜在空间中的高斯噪声来生成高质量的超分辨率先验特征。然后将这些先验特征注入到重构器中，引导重构器重建高分辨率图像。在RCformer中，我们引入了先验引导递归泛化网络（PG-RGN）模块。这些块递归地将输入特征聚合成具有代表性的特征图，使其能够灵活地适应不同维度的输入特征，并通过交叉关注提取全局信息。我们还将PG-RGN与Prior-Guided Local Self-Attention （PG-LSA）相结合，使模型能够准确捕获局部细节特征，并增强对全局上下文的利用。为了实现局部和全局特征的最优组合，我们提出了自适应特征集成（AFI），该方法有效地融合了多个关注层的局部和全局特征。我们的方法还支持级联超分辨率，实现灵活的多阶段细化，这对于复杂的场景特别有用。在标准基准上的综合实验表明，Diff-RCformer在数量和质量上都超过了最近最先进的方法。https://github.com/SureT-T/Diff-RCformer

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Diff-RCformer: A diffusion-augmented Recursive Context Transformer for image super-resolution

Diffusion models have recently exhibited strong potential in single-image super-resolution (SISR) by effectively modeling complex data distributions and generating high-quality reconstructions. However, existing diffusion-based SISR methods often suffer from excessive iterative steps, resulting in a high computational overhead and slow convergence. In addition, traditional convolutional neural networks and Transformer-based architectures have difficulty in capturing complex global information, thereby limiting the reconstruction quality. To address these issues, we propose Diff-RCformer, which is a novel SISR framework that integrates diffusion-based prior generation with the Recursive Context Transformer (RCformer) to achieve robust and efficient super-resolution. Specifically, we use the diffusion model to generate high-quality prior features for super-resolution by iteratively refining Gaussian noise in a compressed latent space. These prior features are then injected into the RCformer, guiding it to reconstruct the high-resolution image. In the RCformer, we introduce Prior-Guided Recursive Generalization Network (PG-RGN) blocks. These blocks recursively aggregate the input features into representative feature maps, enabling them to adapt flexibly to input features of different dimensions and extract global information through cross-attention. We also combine the PG-RGN with Prior-Guided Local Self-Attention (PG-LSA) to enable the model to capture local detail features accurately and enhance the utilization of the global context. To achieve an optimal combination of local and global features, we propose Adaptive Feature Integration (AFI), which efficiently fuses local and global features across multiple attention layers. Our method also supports cascaded super-resolution, enabling flexible multi-stage refinement, which is particularly useful for complex scenarios. Comprehensive experiments on standard benchmarks indicate that Diff-RCformer surpasses recent state-of-the-art methods both quantitatively and qualitatively. https://github.com/SureT-T/Diff-RCformer

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.