High-rank corrected multi-head self attention for image super resolution

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Ying Yuan , Zihao Ren , Yajun Qiu , Bin Sun , Shihao Kou , Caiwen Jiang , Tianliang Zhang
{"title":"High-rank corrected multi-head self attention for image super resolution","authors":"Ying Yuan ,&nbsp;Zihao Ren ,&nbsp;Yajun Qiu ,&nbsp;Bin Sun ,&nbsp;Shihao Kou ,&nbsp;Caiwen Jiang ,&nbsp;Tianliang Zhang","doi":"10.1016/j.knosys.2025.114637","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: <span><span>https://github.com/yyexplorerNB/HiRA-SR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114637"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125016764","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: https://github.com/yyexplorerNB/HiRA-SR.
图像超分辨率高阶校正多头自关注
最近,基于transformer的方法通过利用多头自注意(MSA)来捕获像素之间的远程依赖关系,在图像超分辨率(SR)任务中表现出令人印象深刻的性能。不幸的是,现有的基于变压器的SR方法存在低秩瓶颈,这限制了SR的性能。我们证明这是因为MSA中的注意图被限制使用更多的非零奇异值来进行稳定的表示。增加MSA的投影维数可以消除低秩瓶颈,但会带来巨大的计算负担。此外,我们观察到不同头部的注意图在MSA中表现出信息冗余和互补性。基于这些发现,我们提出了高秩校正多头自注意(HR-MSA),在不引入额外计算负担的情况下,通过高秩注意图捕获精确的依赖信息。我们的HR-MSA首先利用每个像素的完整信息来计算未删节的高阶依赖关系。然后分别对不同的头部进行线性校正,得到高阶加权像素信息。在HR-MSA的基础上,我们设计了一个新的架构,称为超分辨率高阶关注(HiRA-SR)。具体而言,我们开发了聚焦块(focus Block, FB)来转移HR-MSA模块中的局部像素信息,并引入残差多头上下文块(Residual Multi-Head Contextual Block, RMCB)来通过非局部注意整合全局信息。实验表明,我们的HR-MSA可以取代MSA,并在各种最先进的SR方法之间实现高效和有效的改进。我们的HiRA-SR的参数和FLOPs与SwinIR-light相似,为轻量级图像超分辨率设置了新的先进技术。我们的代码将在https://github.com/yyexplorerNB/HiRA-SR上提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信