通过有效的大内核关注探索高质量图像派生变换器

The Visual Computer Pub Date : 2024-07-02 DOI:10.1007/s00371-024-03551-8

Haobo Dong, Tianyu Song, Xuanyu Qi, Jiyu Jin, Guiyue Jin, Lei Fan

{"title":"通过有效的大内核关注探索高质量图像派生变换器","authors":"Haobo Dong, Tianyu Song, Xuanyu Qi, Jiyu Jin, Guiyue Jin, Lei Fan","doi":"10.1007/s00371-024-03551-8","DOIUrl":null,"url":null,"abstract":"In recent years, Transformer has demonstrated significant performance in single image deraining tasks. However, the standard self-attention in the Transformer makes it difficult to model local features of images effectively. To alleviate the above problem, this paper proposes a high-quality deraining Transformer with effective large kernel attention, named as ELKAformer. The network employs the Transformer-Style Effective Large Kernel Conv-Block (ELKB), which contains 3 key designs: Large Kernel Attention Block (LKAB), Dynamical Enhancement Feed-forward Network (DEFN), and Edge Squeeze Recovery Block (ESRB) to guide the extraction of rich features. To be specific, LKAB introduces convolutional modulation to substitute vanilla self-attention and achieve better local representations. The designed DEFN refines the most valuable attention values in LKAB, allowing the overall design to better preserve pixel-wise information. Additionally, we develop ESRB to obtain long-range dependencies of different positional information. Massive experimental results demonstrate that this method achieves favorable effects while effectively saving computational costs. Our code is available at github","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring high-quality image deraining Transformer via effective large kernel attention\",\"authors\":\"Haobo Dong, Tianyu Song, Xuanyu Qi, Jiyu Jin, Guiyue Jin, Lei Fan\",\"doi\":\"10.1007/s00371-024-03551-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, Transformer has demonstrated significant performance in single image deraining tasks. However, the standard self-attention in the Transformer makes it difficult to model local features of images effectively. To alleviate the above problem, this paper proposes a high-quality deraining Transformer with effective large kernel attention, named as ELKAformer. The network employs the Transformer-Style Effective Large Kernel Conv-Block (ELKB), which contains 3 key designs: Large Kernel Attention Block (LKAB), Dynamical Enhancement Feed-forward Network (DEFN), and Edge Squeeze Recovery Block (ESRB) to guide the extraction of rich features. To be specific, LKAB introduces convolutional modulation to substitute vanilla self-attention and achieve better local representations. The designed DEFN refines the most valuable attention values in LKAB, allowing the overall design to better preserve pixel-wise information. Additionally, we develop ESRB to obtain long-range dependencies of different positional information. Massive experimental results demonstrate that this method achieves favorable effects while effectively saving computational costs. Our code is available at github\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03551-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03551-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，Transformer 在单幅图像派生任务中表现出了显著的性能。然而，Transformer 中的标准自关注使得它难以对图像的局部特征进行有效建模。为了解决上述问题，本文提出了一种具有有效大内核注意力的高质量派生变换器，并将其命名为 ELKAformer。该网络采用了 Transformer-Style Effective Large Kernel Conv-Block (ELKB)，其中包含 3 个关键设计：大型内核注意块（LKAB）、动态增强前馈网络（DEFN）和边缘挤压恢复块（ESRB），用于指导提取丰富的特征。具体来说，LKAB 引入了卷积调制，以替代虚无自注意，实现更好的局部表征。所设计的 DEFN 提炼出了 LKAB 中最有价值的注意力值，使整体设计能够更好地保存像素信息。此外，我们还开发了 ESRB，以获得不同位置信息的长程依赖性。大量实验结果表明，这种方法在取得良好效果的同时，还有效地节约了计算成本。我们的代码可在 github

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Exploring high-quality image deraining Transformer via effective large kernel attention

查看原文本刊更多论文

Exploring high-quality image deraining Transformer via effective large kernel attention

In recent years, Transformer has demonstrated significant performance in single image deraining tasks. However, the standard self-attention in the Transformer makes it difficult to model local features of images effectively. To alleviate the above problem, this paper proposes a high-quality deraining Transformer with effective large kernel attention, named as ELKAformer. The network employs the Transformer-Style Effective Large Kernel Conv-Block (ELKB), which contains 3 key designs: Large Kernel Attention Block (LKAB), Dynamical Enhancement Feed-forward Network (DEFN), and Edge Squeeze Recovery Block (ESRB) to guide the extraction of rich features. To be specific, LKAB introduces convolutional modulation to substitute vanilla self-attention and achieve better local representations. The designed DEFN refines the most valuable attention values in LKAB, allowing the overall design to better preserve pixel-wise information. Additionally, we develop ESRB to obtain long-range dependencies of different positional information. Massive experimental results demonstrate that this method achieves favorable effects while effectively saving computational costs. Our code is available at github

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量