MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-21 DOI:10.1109/TCSVT.2025.3553135

Chengxing Xie;Xiaoming Zhang;Linze Li;Yuqian Fu;Biao Gong;Tianrui Li;Kai Zhang

{"title":"MAT: Multi-Range Attention Transformer for Efficient Image Super-Resolution","authors":"Chengxing Xie;Xiaoming Zhang;Linze Li;Yuqian Fu;Biao Gong;Tianrui Li;Kai Zhang","doi":"10.1109/TCSVT.2025.3553135","DOIUrl":null,"url":null,"abstract":"Image super-resolution (SR) has significantly advanced through the adoption of Transformer architectures. However, conventional techniques aimed at enlarging the self-attention window to capture broader contexts come with inherent drawbacks, especially the significantly increased computational demands. Moreover, the feature perception within a fixed-size window of existing models restricts the effective receptive field (ERF) and the intermediate feature diversity. We demonstrate that a flexible integration of attention across diverse spatial extents can yield significant performance enhancements. In line with this insight, we introduce Multi-Range Attention Transformer (MAT) for SR tasks. MAT leverages the computational advantages inherent in dilation operation, in conjunction with self-attention mechanism, to facilitate both multi-range attention (MA) and sparse multi-range attention (SMA), enabling efficient capture of both regional and sparse global features. Combined with local feature extraction, MAT adeptly capture dependencies across various spatial ranges, improving the diversity and efficacy of its feature representations. We also introduce the MSConvStar module, which augments the model’s ability for multi-range representation learning. Comprehensive experiments show that our MAT exhibits superior performance to existing state-of-the-art SR models with remarkable efficiency (<inline-formula> <tex-math>$\\sim 3.3\\times $ </tex-math></inline-formula> faster than SRFormer-light). The codes are available at <uri>https://github.com/stella-von/MAT</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"8945-8957"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10935664/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Image super-resolution (SR) has significantly advanced through the adoption of Transformer architectures. However, conventional techniques aimed at enlarging the self-attention window to capture broader contexts come with inherent drawbacks, especially the significantly increased computational demands. Moreover, the feature perception within a fixed-size window of existing models restricts the effective receptive field (ERF) and the intermediate feature diversity. We demonstrate that a flexible integration of attention across diverse spatial extents can yield significant performance enhancements. In line with this insight, we introduce Multi-Range Attention Transformer (MAT) for SR tasks. MAT leverages the computational advantages inherent in dilation operation, in conjunction with self-attention mechanism, to facilitate both multi-range attention (MA) and sparse multi-range attention (SMA), enabling efficient capture of both regional and sparse global features. Combined with local feature extraction, MAT adeptly capture dependencies across various spatial ranges, improving the diversity and efficacy of its feature representations. We also introduce the MSConvStar module, which augments the model’s ability for multi-range representation learning. Comprehensive experiments show that our MAT exhibits superior performance to existing state-of-the-art SR models with remarkable efficiency (

$\sim 3.3\times $

faster than SRFormer-light). The codes are available at https://github.com/stella-von/MAT.

查看原文本刊更多论文

MAT：用于高效图像超分辨率的多范围注意力转换器

通过采用Transformer架构，图像超分辨率（SR）得到了显著的提高。然而，旨在扩大自注意窗口以捕获更广泛上下文的传统技术具有固有的缺点，特别是显著增加的计算需求。此外，现有模型在固定大小窗口内的特征感知限制了有效接受野（ERF）和中间特征多样性。我们证明，在不同的空间范围内灵活地整合注意力可以显著提高性能。根据这一见解，我们为SR任务引入了多范围注意力转换器（MAT）。MAT利用扩展运算固有的计算优势，结合自注意机制，促进多范围注意（MA）和稀疏多范围注意（SMA），实现区域和稀疏全局特征的有效捕获。结合局部特征提取，MAT能够熟练地捕获不同空间范围内的依赖关系，提高了其特征表示的多样性和有效性。我们还引入了MSConvStar模块，增强了模型的多范围表示学习能力。综合实验表明，我们的MAT表现出优于现有最先进的SR模型的性能，具有显着的效率（比SRFormer-light快3.3倍）。代码可在https://github.com/stella-von/MAT上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.