基于波变换ViT的单幅图像双向交互多尺度网络

IF 3.4 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing-Image Communication Pub Date : 2025-03-23 DOI:10.1016/j.image.2025.117311

Siyan Fang, Bin Liu

{"title":"基于波变换ViT的单幅图像双向交互多尺度网络","authors":"Siyan Fang, Bin Liu","doi":"10.1016/j.image.2025.117311","DOIUrl":null,"url":null,"abstract":"<div><div>To address the limitations of high-frequency information capture by Vision Transformer (ViT) and the loss of fine details in existing image deraining methods, we introduce a Bidirectional Interactive Multi-Scale Network (BIMNet) that employs newly developed Wave-Conv ViT (WCV). The WCV utilizes a wavelet transform to enable self-attention in both low-frequency and high-frequency domains, significantly enhancing ViT's capacity for diverse frequency-domain feature modeling. Additionally, by incorporating convolutional operations, WCV enhances the extraction and integration of local features across various spatial windows. BIMNet injects rainy images into deep network layers, enabling bidirectional propagation with shallow layer features that enrich skip connections with detailed and complementary information, thus improving the fidelity of detail recovery. Moreover, we present the CORain1000 dataset, tailored for the dual challenges of image deraining and object detection, which offers more diversity in rain patterns, image sizes, and volumes than the commonly used COCO350 dataset. Extensive experiments demonstrate the superiority of BIMNet over advanced methods. The code and CORain1000 dataset are available at <span><span>https://github.com/fashyon/BIMNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"137 ","pages":"Article 117311"},"PeriodicalIF":3.4000,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bidirectional interactive multi-scale network using Wave-Conv ViT for single image deraining\",\"authors\":\"Siyan Fang, Bin Liu\",\"doi\":\"10.1016/j.image.2025.117311\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To address the limitations of high-frequency information capture by Vision Transformer (ViT) and the loss of fine details in existing image deraining methods, we introduce a Bidirectional Interactive Multi-Scale Network (BIMNet) that employs newly developed Wave-Conv ViT (WCV). The WCV utilizes a wavelet transform to enable self-attention in both low-frequency and high-frequency domains, significantly enhancing ViT's capacity for diverse frequency-domain feature modeling. Additionally, by incorporating convolutional operations, WCV enhances the extraction and integration of local features across various spatial windows. BIMNet injects rainy images into deep network layers, enabling bidirectional propagation with shallow layer features that enrich skip connections with detailed and complementary information, thus improving the fidelity of detail recovery. Moreover, we present the CORain1000 dataset, tailored for the dual challenges of image deraining and object detection, which offers more diversity in rain patterns, image sizes, and volumes than the commonly used COCO350 dataset. Extensive experiments demonstrate the superiority of BIMNet over advanced methods. The code and CORain1000 dataset are available at <span><span>https://github.com/fashyon/BIMNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49521,\"journal\":{\"name\":\"Signal Processing-Image Communication\",\"volume\":\"137 \",\"pages\":\"Article 117311\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signal Processing-Image Communication\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S092359652500058X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing-Image Communication","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092359652500058X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

为了解决视觉变换（Vision Transformer, ViT）捕获高频信息的局限性以及现有图像脱噪方法中细节的丢失，我们引入了一种采用新开发的波转换ViT （Wave-Conv ViT， WCV）的双向交互多尺度网络（BIMNet）。WCV利用小波变换实现低频和高频域的自关注，显著增强了ViT对不同频域特征建模的能力。此外，通过结合卷积操作，WCV增强了跨不同空间窗口的局部特征的提取和集成。BIMNet将雨天图像注入到深层网络层中，实现具有浅层特征的双向传播，以详实互补的信息丰富跳跃连接，提高细节恢复的保真度。此外，我们提出了CORain1000数据集，为图像脱除和目标检测的双重挑战量身定制，与常用的COCO350数据集相比，它在降雨模式、图像大小和体积方面提供了更多的多样性。大量的实验证明了BIMNet相对于其他先进方法的优越性。代码和CORain1000数据集可从https://github.com/fashyon/BIMNet获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bidirectional interactive multi-scale network using Wave-Conv ViT for single image deraining

To address the limitations of high-frequency information capture by Vision Transformer (ViT) and the loss of fine details in existing image deraining methods, we introduce a Bidirectional Interactive Multi-Scale Network (BIMNet) that employs newly developed Wave-Conv ViT (WCV). The WCV utilizes a wavelet transform to enable self-attention in both low-frequency and high-frequency domains, significantly enhancing ViT's capacity for diverse frequency-domain feature modeling. Additionally, by incorporating convolutional operations, WCV enhances the extraction and integration of local features across various spatial windows. BIMNet injects rainy images into deep network layers, enabling bidirectional propagation with shallow layer features that enrich skip connections with detailed and complementary information, thus improving the fidelity of detail recovery. Moreover, we present the CORain1000 dataset, tailored for the dual challenges of image deraining and object detection, which offers more diversity in rain patterns, image sizes, and volumes than the commonly used COCO350 dataset. Extensive experiments demonstrate the superiority of BIMNet over advanced methods. The code and CORain1000 dataset are available at https://github.com/fashyon/BIMNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Signal Processing-Image Communication 工程技术-工程：电子与电气

CiteScore

8.40

自引率

2.90%

发文量

138

审稿时长

5.2 months

期刊介绍： Signal Processing: Image Communication is an international journal for the development of the theory and practice of image communication. Its primary objectives are the following: To present a forum for the advancement of theory and practice of image communication. To stimulate cross-fertilization between areas similar in nature which have traditionally been separated, for example, various aspects of visual communications and information systems. To contribute to a rapid information exchange between the industrial and academic environments. The editorial policy and the technical content of the journal are the responsibility of the Editor-in-Chief, the Area Editors and the Advisory Editors. The Journal is self-supporting from subscription income and contains a minimum amount of advertisements. Advertisements are subject to the prior approval of the Editor-in-Chief. The journal welcomes contributions from every country in the world. Signal Processing: Image Communication publishes articles relating to aspects of the design, implementation and use of image communication systems. The journal features original research work, tutorial and review articles, and accounts of practical developments. Subjects of interest include image/video coding, 3D video representations and compression, 3D graphics and animation compression, HDTV and 3DTV systems, video adaptation, video over IP, peer-to-peer video networking, interactive visual communication, multi-user video conferencing, wireless video broadcasting and communication, visual surveillance, 2D and 3D image/video quality measures, pre/post processing, video restoration and super-resolution, multi-camera video analysis, motion analysis, content-based image/video indexing and retrieval, face and gesture processing, video synthesis, 2D and 3D image/video acquisition and display technologies, architectures for image/video processing and communication.