Ying Yuan , Zihao Ren , Yajun Qiu , Bin Sun , Shihao Kou , Caiwen Jiang , Tianliang Zhang
{"title":"图像超分辨率高阶校正多头自关注","authors":"Ying Yuan , Zihao Ren , Yajun Qiu , Bin Sun , Shihao Kou , Caiwen Jiang , Tianliang Zhang","doi":"10.1016/j.knosys.2025.114637","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: <span><span>https://github.com/yyexplorerNB/HiRA-SR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114637"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-rank corrected multi-head self attention for image super resolution\",\"authors\":\"Ying Yuan , Zihao Ren , Yajun Qiu , Bin Sun , Shihao Kou , Caiwen Jiang , Tianliang Zhang\",\"doi\":\"10.1016/j.knosys.2025.114637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: <span><span>https://github.com/yyexplorerNB/HiRA-SR</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114637\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125016764\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125016764","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
High-rank corrected multi-head self attention for image super resolution
Recently, Transformer-based methods have shown impressive performance in image super resolution (SR) tasks, by exploiting multi-head self attention (MSA) to capture long-range dependencies between pixels. Unfortunately, there is a low-rank bottleneck in existing Transformer-based SR methods, which limits SR performance. We demonstrate that this is because the attention map in MSA is restricted to using more non-zero singular values to make stable representation. Increasing the projection dimension of MSA can eliminate the low-rank bottleneck, but results in overwhelming computational burden. Furthermore, we observe that the attention maps of different heads in MSA exhibit both information redundancy and complementarity. Based on these findings, we propose High-Rank Corrected Multi-Head Self-Attention (HR-MSA) to capture precise dependency information by high-rank attention maps without introducing additional computational burden. Our HR-MSA first utilizes the complete information of each pixel to compute an unabridged high-rank dependency. Then it independently applies linear corrections to different heads and achieves a high-rank weighted pixel information. Building around HR-MSA, we design a new architecture called High-Rank Attention for Super Resolution (HiRA-SR). Specifically, We develop Focusing Block (FB) to divert local pixel information from the HR-MSA module and introduce Residual Multi-Head Contextual Block (RMCB) to integrate global information through non-local attention. Experiments demonstrate that our HR-MSA can replace MSA and achieve efficient and effective improvements across various state-of-the-art SR methods. With parameters and FLOPs similar to SwinIR-light, our HiRA-SR sets a new state-of-the-art for lightweight image super-resolution. Our code will be available at: https://github.com/yyexplorerNB/HiRA-SR.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.