码本先验引导混合注意力除雾网络

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-08-16 DOI:10.1016/j.imavis.2025.105700

Liqin Huang , Hanyu Zheng , Lin Pan , Zhipeng Su , Qiang Wu

{"title":"码本先验引导混合注意力除雾网络","authors":"Liqin Huang , Hanyu Zheng , Lin Pan , Zhipeng Su , Qiang Wu","doi":"10.1016/j.imavis.2025.105700","DOIUrl":null,"url":null,"abstract":"<div><div>Transformers have been widely used in image dehazing tasks due to their powerful self-attention mechanism for capturing long-range dependencies. However, directly applying Transformers often leads to coarse details during image reconstruction, especially in complex real-world hazy scenarios. To address this problem, we propose a novel Hybrid Attention Encoder (HAE). Specifically, a channel-attention-based convolution block is integrated into the Swin-Transformer architecture. This design enhances the local features at each position through an overlapping block-wise spatial attention mechanism while leveraging the advantages of channel attention in global information processing to strengthen the network’s representation capability. Moreover, to adapt to various complex hazy environments, a high-quality codebook prior encapsulating the color and texture knowledge of high-resolution clear scenes is introduced. We also propose a more flexible Binary Matching Mechanism (BMM) to better align the codebook prior with the network, further unlocking the potential of the model. Extensive experiments demonstrate that our method consistently outperforms the second-best methods by a margin of 8% to 19% across multiple metrics on the RTTS and URHI datasets. The source code has been released at <span><span>https://github.com/HanyuZheng25/HADehzeNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105700"},"PeriodicalIF":4.2000,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Codebook prior-guided hybrid attention dehazing network\",\"authors\":\"Liqin Huang , Hanyu Zheng , Lin Pan , Zhipeng Su , Qiang Wu\",\"doi\":\"10.1016/j.imavis.2025.105700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Transformers have been widely used in image dehazing tasks due to their powerful self-attention mechanism for capturing long-range dependencies. However, directly applying Transformers often leads to coarse details during image reconstruction, especially in complex real-world hazy scenarios. To address this problem, we propose a novel Hybrid Attention Encoder (HAE). Specifically, a channel-attention-based convolution block is integrated into the Swin-Transformer architecture. This design enhances the local features at each position through an overlapping block-wise spatial attention mechanism while leveraging the advantages of channel attention in global information processing to strengthen the network’s representation capability. Moreover, to adapt to various complex hazy environments, a high-quality codebook prior encapsulating the color and texture knowledge of high-resolution clear scenes is introduced. We also propose a more flexible Binary Matching Mechanism (BMM) to better align the codebook prior with the network, further unlocking the potential of the model. Extensive experiments demonstrate that our method consistently outperforms the second-best methods by a margin of 8% to 19% across multiple metrics on the RTTS and URHI datasets. The source code has been released at <span><span>https://github.com/HanyuZheng25/HADehzeNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105700\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-08-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002884\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002884","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于变压器具有强大的自注意机制，可以捕获远程依赖关系，因此被广泛应用于图像去雾任务中。然而，直接使用变形金刚往往会导致图像重建过程中的细节粗糙，特别是在复杂的现实世界朦胧场景中。为了解决这个问题，我们提出了一种新的混合注意编码器（HAE）。具体来说，一个基于通道注意力的卷积块被集成到swing - transformer架构中。本设计通过重叠的分块空间注意机制增强每个位置的局部特征，同时利用通道注意在全局信息处理中的优势，增强网络的表征能力。此外，为了适应各种复杂的雾霾环境，引入了封装高分辨率清晰场景颜色和纹理知识的高质量码本先验算法。我们还提出了一种更灵活的二进制匹配机制（BMM），以更好地将事先的码本与网络对齐，进一步释放模型的潜力。大量的实验表明，我们的方法在RTTS和URHI数据集上的多个指标上始终优于第二好的方法8%到19%。源代码已在https://github.com/HanyuZheng25/HADehzeNet上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Codebook prior-guided hybrid attention dehazing network

Transformers have been widely used in image dehazing tasks due to their powerful self-attention mechanism for capturing long-range dependencies. However, directly applying Transformers often leads to coarse details during image reconstruction, especially in complex real-world hazy scenarios. To address this problem, we propose a novel Hybrid Attention Encoder (HAE). Specifically, a channel-attention-based convolution block is integrated into the Swin-Transformer architecture. This design enhances the local features at each position through an overlapping block-wise spatial attention mechanism while leveraging the advantages of channel attention in global information processing to strengthen the network’s representation capability. Moreover, to adapt to various complex hazy environments, a high-quality codebook prior encapsulating the color and texture knowledge of high-resolution clear scenes is introduced. We also propose a more flexible Binary Matching Mechanism (BMM) to better align the codebook prior with the network, further unlocking the potential of the model. Extensive experiments demonstrate that our method consistently outperforms the second-best methods by a margin of 8% to 19% across multiple metrics on the RTTS and URHI datasets. The source code has been released at https://github.com/HanyuZheng25/HADehzeNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.