图像超分辨率高阶校正多头自关注

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-10-11 DOI:10.1016/j.knosys.2025.114637

Ying Yuan , Zihao Ren , Yajun Qiu , Bin Sun , Shihao Kou , Caiwen Jiang , Tianliang Zhang

{"title":"图像超分辨率高阶校正多头自关注","authors":"Ying Yuan , Zihao Ren , Yajun Qiu , Bin Sun , Shihao Kou , Caiwen Jiang , Tianliang Zhang","doi":"10.1016/j.knosys.2025.114637","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: <span><span>https://github.com/yyexplorerNB/HiRA-SR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114637"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-rank corrected multi-head self attention for image super resolution\",\"authors\":\"Ying Yuan , Zihao Ren , Yajun Qiu , Bin Sun , Shihao Kou , Caiwen Jiang , Tianliang Zhang\",\"doi\":\"10.1016/j.knosys.2025.114637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: <span><span>https://github.com/yyexplorerNB/HiRA-SR</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114637\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125016764\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125016764","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

最近，基于transformer的方法通过利用多头自注意（MSA）来捕获像素之间的远程依赖关系，在图像超分辨率（SR）任务中表现出令人印象深刻的性能。不幸的是，现有的基于变压器的SR方法存在低秩瓶颈，这限制了SR的性能。我们证明这是因为MSA中的注意图被限制使用更多的非零奇异值来进行稳定的表示。增加MSA的投影维数可以消除低秩瓶颈，但会带来巨大的计算负担。此外，我们观察到不同头部的注意图在MSA中表现出信息冗余和互补性。基于这些发现，我们提出了高秩校正多头自注意（HR-MSA），在不引入额外计算负担的情况下，通过高秩注意图捕获精确的依赖信息。我们的HR-MSA首先利用每个像素的完整信息来计算未删节的高阶依赖关系。然后分别对不同的头部进行线性校正，得到高阶加权像素信息。在HR-MSA的基础上，我们设计了一个新的架构，称为超分辨率高阶关注（HiRA-SR）。具体而言，我们开发了聚焦块（focus Block， FB）来转移HR-MSA模块中的局部像素信息，并引入残差多头上下文块（Residual Multi-Head Contextual Block， RMCB）来通过非局部注意整合全局信息。实验表明，我们的HR-MSA可以取代MSA，并在各种最先进的SR方法之间实现高效和有效的改进。我们的HiRA-SR的参数和FLOPs与SwinIR-light相似，为轻量级图像超分辨率设置了新的先进技术。我们的代码将在https://github.com/yyexplorerNB/HiRA-SR上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-rank corrected multi-head self attention for image super resolution

Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: https://github.com/yyexplorerNB/HiRA-SR.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.