MFET：用于单图像超分辨率的多频增强变压器

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-09-28 DOI:10.1016/j.imavis.2025.105751

Yunlei Sun, Pengxiao Shi, Tiancheng Chen, Danning Qi, Ke Xu

{"title":"MFET：用于单图像超分辨率的多频增强变压器","authors":"Yunlei Sun, Pengxiao Shi, Tiancheng Chen, Danning Qi, Ke Xu","doi":"10.1016/j.imavis.2025.105751","DOIUrl":null,"url":null,"abstract":"<div><div>Single-Image Super-Resolution (SISR) aims to reconstruct a high-resolution image from a low-resolution input while effectively preserving structural integrity and fine details. However, (i) low-frequency structural cues progressively fade during deep-layer propagation, and (ii) existing upsampling modules either ignore multi-scale context or incur excessive computation, leading to unsatisfactory high-frequency texture recovery. To address these limitations, we propose the Multi-Frequency Enhancement Transformer (MFET), a novel Transformer-based network tailored for efficient SISR. MFET seamlessly integrates low-frequency structural preservation with high-frequency detail recovery through its Multi-Frequency Block (MFB). The MFB employs a Residual Attention Mechanism (RAM) to propagate fine-grained features across layers, ensuring robust retention of low-level details, and an Efficient Upscale Module (EUM) with a pyramidal structure and depthwise separable convolutions to enhance high-frequency components with minimal computational cost. Extensive experiments on benchmark datasets demonstrate that MFET achieves superior performance in PSNR and SSIM, particularly at ×3 and ×4 scales, excelling in texture and edge reconstruction. MFET strikes an optimal balance between quality and efficiency, offering a promising solution for high-quality super-resolution. Our code is available at <span><span>https://github.com/snh4/MFET</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105751"},"PeriodicalIF":4.2000,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MFET: Multi-frequency enhancement transformer for single-image super-resolution\",\"authors\":\"Yunlei Sun, Pengxiao Shi, Tiancheng Chen, Danning Qi, Ke Xu\",\"doi\":\"10.1016/j.imavis.2025.105751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Single-Image Super-Resolution (SISR) aims to reconstruct a high-resolution image from a low-resolution input while effectively preserving structural integrity and fine details. However, (i) low-frequency structural cues progressively fade during deep-layer propagation, and (ii) existing upsampling modules either ignore multi-scale context or incur excessive computation, leading to unsatisfactory high-frequency texture recovery. To address these limitations, we propose the Multi-Frequency Enhancement Transformer (MFET), a novel Transformer-based network tailored for efficient SISR. MFET seamlessly integrates low-frequency structural preservation with high-frequency detail recovery through its Multi-Frequency Block (MFB). The MFB employs a Residual Attention Mechanism (RAM) to propagate fine-grained features across layers, ensuring robust retention of low-level details, and an Efficient Upscale Module (EUM) with a pyramidal structure and depthwise separable convolutions to enhance high-frequency components with minimal computational cost. Extensive experiments on benchmark datasets demonstrate that MFET achieves superior performance in PSNR and SSIM, particularly at ×3 and ×4 scales, excelling in texture and edge reconstruction. MFET strikes an optimal balance between quality and efficiency, offering a promising solution for high-quality super-resolution. Our code is available at <span><span>https://github.com/snh4/MFET</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"163 \",\"pages\":\"Article 105751\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625003397\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625003397","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

单图像超分辨率（SISR）旨在从低分辨率输入重建高分辨率图像，同时有效地保持结构完整性和精细细节。然而，(i)低频结构线索在深层传播过程中逐渐消失，（ii）现有的上采样模块要么忽略多尺度上下文，要么产生过多的计算，导致高频纹理恢复不理想。为了解决这些限制，我们提出了多频增强变压器（MFET），这是一种为高效SISR量身定制的基于变压器的新型网络。MFET通过其多频块（MFB）无缝集成了低频结构保存和高频细节恢复。MFB采用残余注意机制（RAM）跨层传播细粒度特征，确保低级细节的鲁棒保留，以及具有金字塔结构和深度可分离卷积的高效高级模块（EUM），以最小的计算成本增强高频组件。在基准数据集上的大量实验表明，MFET在PSNR和SSIM方面取得了优异的性能，特别是在×3和×4尺度上，在纹理和边缘重建方面表现出色。MFET在质量和效率之间取得了最佳平衡，为高质量的超分辨率提供了有前途的解决方案。我们的代码可在https://github.com/snh4/MFET上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MFET: Multi-frequency enhancement transformer for single-image super-resolution

Single-Image Super-Resolution (SISR) aims to reconstruct a high-resolution image from a low-resolution input while effectively preserving structural integrity and fine details. However, (i) low-frequency structural cues progressively fade during deep-layer propagation, and (ii) existing upsampling modules either ignore multi-scale context or incur excessive computation, leading to unsatisfactory high-frequency texture recovery. To address these limitations, we propose the Multi-Frequency Enhancement Transformer (MFET), a novel Transformer-based network tailored for efficient SISR. MFET seamlessly integrates low-frequency structural preservation with high-frequency detail recovery through its Multi-Frequency Block (MFB). The MFB employs a Residual Attention Mechanism (RAM) to propagate fine-grained features across layers, ensuring robust retention of low-level details, and an Efficient Upscale Module (EUM) with a pyramidal structure and depthwise separable convolutions to enhance high-frequency components with minimal computational cost. Extensive experiments on benchmark datasets demonstrate that MFET achieves superior performance in PSNR and SSIM, particularly at ×3 and ×4 scales, excelling in texture and edge reconstruction. MFET strikes an optimal balance between quality and efficiency, offering a promising solution for high-quality super-resolution. Our code is available at https://github.com/snh4/MFET.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.