Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhicheng Ma , Zhaoxiang Liu , Kai Wang , Shiguo Lian
{"title":"Hybrid attention transformer with re-parameterized large kernel convolution for image super-resolution","authors":"Zhicheng Ma ,&nbsp;Zhaoxiang Liu ,&nbsp;Kai Wang ,&nbsp;Shiguo Lian","doi":"10.1016/j.imavis.2024.105162","DOIUrl":null,"url":null,"abstract":"<div><p>Single image super-resolution is a well-established low-level vision task that aims to reconstruct high-resolution images from low-resolution images. Methods based on Transformer have shown remarkable success and achieved outstanding performance in SISR tasks. While Transformer effectively models global information, it is less effective at capturing high frequencies such as stripes that primarily provide local information. Additionally, it has the potential to further enhance the capture of global information. To tackle this, we propose a novel Large Kernel Hybrid Attention Transformer using re-parameterization. It combines different kernel sizes and different steps re-parameterized convolution layers with Transformer to effectively capture global and local information to learn comprehensive features with low-frequency and high-frequency information. Moreover, in order to solve the problem of using batch normalization layer to introduce artifacts in SISR, we propose a new training strategy which is fusing convolution layer and batch normalization layer after certain training epochs. This strategy can enjoy the acceleration convergence effect of batch normalization layer in training and effectively eliminate the problem of artifacts in the inference stage. For re-parameterization of multiple parallel branch convolution layers, adopting this strategy can further reduce the amount of calculation of training. By coupling these core improvements, our LKHAT achieves state-of-the-art performance for single image super-resolution task.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624002671","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Single image super-resolution is a well-established low-level vision task that aims to reconstruct high-resolution images from low-resolution images. Methods based on Transformer have shown remarkable success and achieved outstanding performance in SISR tasks. While Transformer effectively models global information, it is less effective at capturing high frequencies such as stripes that primarily provide local information. Additionally, it has the potential to further enhance the capture of global information. To tackle this, we propose a novel Large Kernel Hybrid Attention Transformer using re-parameterization. It combines different kernel sizes and different steps re-parameterized convolution layers with Transformer to effectively capture global and local information to learn comprehensive features with low-frequency and high-frequency information. Moreover, in order to solve the problem of using batch normalization layer to introduce artifacts in SISR, we propose a new training strategy which is fusing convolution layer and batch normalization layer after certain training epochs. This strategy can enjoy the acceleration convergence effect of batch normalization layer in training and effectively eliminate the problem of artifacts in the inference stage. For re-parameterization of multiple parallel branch convolution layers, adopting this strategy can further reduce the amount of calculation of training. By coupling these core improvements, our LKHAT achieves state-of-the-art performance for single image super-resolution task.

Abstract Image

混合注意力变换器与重参数化大核卷积用于图像超分辨率
单图像超分辨率是一项成熟的低级视觉任务,旨在从低分辨率图像重建高分辨率图像。基于变换器的方法在 SISR 任务中取得了显著的成功和出色的性能。虽然 Transformer 能有效地模拟全局信息,但它在捕捉高频率(如主要提供局部信息的条纹)方面却不太有效。此外,它还有可能进一步增强对全局信息的捕捉。为了解决这个问题,我们提出了一种使用重新参数化的新型大核混合注意力变换器。它将不同核大小和不同步骤的重参数化卷积层与变换器相结合,有效捕捉全局和局部信息,从而学习到具有低频和高频信息的综合特征。此外,为了解决在 SISR 中使用批量归一化层会引入伪影的问题,我们提出了一种新的训练策略,即在一定训练历元后融合卷积层和批量归一化层。这种策略既能享受批归一化层在训练中的加速收敛效果,又能有效消除推理阶段的伪影问题。对于多并行分支卷积层的重新参数化,采用这种策略可以进一步减少训练的计算量。结合这些核心改进,我们的 LKHAT 在单图像超分辨率任务中实现了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信