CMASR: Lightweight image super-resolution with cluster and match attention

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-14 DOI:10.1016/j.imavis.2025.105457

Detian Huang , Mingxin Lin , Hang Liu , Huanqiang Zeng

{"title":"CMASR: Lightweight image super-resolution with cluster and match attention","authors":"Detian Huang , Mingxin Lin , Hang Liu , Huanqiang Zeng","doi":"10.1016/j.imavis.2025.105457","DOIUrl":null,"url":null,"abstract":"<div><div>The Transformer has recently achieved impressive success in image super-resolution due to its ability to model long-range dependencies with multi-head self-attention (MHSA). However, most existing MHSAs focus only on the dependencies among individual tokens, and ignore the ones among token clusters containing several tokens, resulting in the inability of Transformer to adequately explore global features. On the other hand, Transformer neglects local features, which inevitably hinders accurate detail reconstruction. To address the above issues, we propose a lightweight image super-resolution method with cluster and match attention (CMASR). Specifically, a token Clustering block is designed to divide input tokens into token clusters of different sizes with depthwise separable convolution. Subsequently, we propose an efficient axial matching self-attention (AMSA) mechanism, which introduces an axial matrix to extract local features, including axial similarities and symmetries. Further, by combining AMSA and Window Self-Attention, we construct a Hybrid Self-Attention block to capture the dependencies among token clusters of different sizes to sufficiently extract axial local features and global features. Extensive experiments demonstrate that the proposed CMASR outperforms state-of-the-art methods with fewer computational cost (i.e., the number of parameters and FLOPs).</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"155 ","pages":"Article 105457"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625000459","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The Transformer has recently achieved impressive success in image super-resolution due to its ability to model long-range dependencies with multi-head self-attention (MHSA). However, most existing MHSAs focus only on the dependencies among individual tokens, and ignore the ones among token clusters containing several tokens, resulting in the inability of Transformer to adequately explore global features. On the other hand, Transformer neglects local features, which inevitably hinders accurate detail reconstruction. To address the above issues, we propose a lightweight image super-resolution method with cluster and match attention (CMASR). Specifically, a token Clustering block is designed to divide input tokens into token clusters of different sizes with depthwise separable convolution. Subsequently, we propose an efficient axial matching self-attention (AMSA) mechanism, which introduces an axial matrix to extract local features, including axial similarities and symmetries. Further, by combining AMSA and Window Self-Attention, we construct a Hybrid Self-Attention block to capture the dependencies among token clusters of different sizes to sufficiently extract axial local features and global features. Extensive experiments demonstrate that the proposed CMASR outperforms state-of-the-art methods with fewer computational cost (i.e., the number of parameters and FLOPs).

查看原文本刊更多论文

CMASR：具有聚类和匹配关注的轻量级图像超分辨率

Transformer最近在图像超分辨率方面取得了令人印象深刻的成功，因为它能够模拟多头自关注（MHSA）的远程依赖关系。然而，大多数现有的mhsa只关注单个令牌之间的依赖关系，而忽略了包含多个令牌的令牌集群之间的依赖关系，导致Transformer无法充分探索全局特性。另一方面，Transformer忽略了局部特征，这不可避免地妨碍了准确的细节重建。为了解决上述问题，我们提出了一种基于聚类和匹配注意（CMASR）的轻量级图像超分辨率方法。具体来说，设计了一个标记聚类块，通过深度可分离卷积将输入标记划分为不同大小的标记聚类。随后，我们提出了一种有效的轴匹配自注意（AMSA）机制，该机制引入轴向矩阵来提取局部特征，包括轴向相似性和对称性。此外，通过结合AMSA和窗口自注意，构建了一个混合自注意块来捕获不同大小的令牌簇之间的依赖关系，以充分提取轴向局部特征和全局特征。大量的实验表明，所提出的CMASR以更少的计算成本（即参数和FLOPs的数量）优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.