HFGlobalFormer: When High-Frequency Recovery Meets Global Context Modeling for Compressed Image Deraindrop

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-12-24 DOI:10.1109/TMM.2024.3521831

Rongqun Lin;Wenhan Yang;Baoliang Chen;Pingping Zhang;Yue Liu;Shiqi Wang;Sam Kwong

{"title":"HFGlobalFormer: When High-Frequency Recovery Meets Global Context Modeling for Compressed Image Deraindrop","authors":"Rongqun Lin;Wenhan Yang;Baoliang Chen;Pingping Zhang;Yue Liu;Shiqi Wang;Sam Kwong","doi":"10.1109/TMM.2024.3521831","DOIUrl":null,"url":null,"abstract":"When transmission medium and compression degradation are intertwined, new challenges emerge. This study addresses the problem of raindrop removal from compressed images, where raindrops obscure large areas of the background and compression leads to the loss of high-frequency (HF) information. The restoration of the former requires global contextual information, while the latter necessitates guidance for high-frequency details, resulting in a conflict in utilizing these two types of information when designing existing methods. To address this issue, we propose a novel transformer architecture that leverages the advantages of attention mechanism and HF-friendly design to effectively restore the compressed raindrop images at the framework, component, and module levels. Specifically, at the framework level, we integrate relative position multi-head self-attention and convolutional layers into the proposed low-high-frequency transformer (LHFT), where the former captures global contextual information and the latter focuses on high-frequency information. Their combination effectively resolves the issue of mixed degradation. At the component level, we utilize high-frequency depth-wise convolution (HFDC) with zero-mean kernels to improve the capability to extract high-frequency features, drawing inspiration from typical high-frequency filters like Prewitt and Sobel operators. Finally, at the module level, we introduce a low-high-attention module (LHAM) to adaptively allocate the importance of low and high frequencies along channels for effective fusion. We establish the JPEG-compressed raindrop image dataset and conduct extensive experiments on different compression rates. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods without increasing computational costs.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2070-2082"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814691/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

When transmission medium and compression degradation are intertwined, new challenges emerge. This study addresses the problem of raindrop removal from compressed images, where raindrops obscure large areas of the background and compression leads to the loss of high-frequency (HF) information. The restoration of the former requires global contextual information, while the latter necessitates guidance for high-frequency details, resulting in a conflict in utilizing these two types of information when designing existing methods. To address this issue, we propose a novel transformer architecture that leverages the advantages of attention mechanism and HF-friendly design to effectively restore the compressed raindrop images at the framework, component, and module levels. Specifically, at the framework level, we integrate relative position multi-head self-attention and convolutional layers into the proposed low-high-frequency transformer (LHFT), where the former captures global contextual information and the latter focuses on high-frequency information. Their combination effectively resolves the issue of mixed degradation. At the component level, we utilize high-frequency depth-wise convolution (HFDC) with zero-mean kernels to improve the capability to extract high-frequency features, drawing inspiration from typical high-frequency filters like Prewitt and Sobel operators. Finally, at the module level, we introduce a low-high-attention module (LHAM) to adaptively allocate the importance of low and high frequencies along channels for effective fusion. We establish the JPEG-compressed raindrop image dataset and conduct extensive experiments on different compression rates. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods without increasing computational costs.

查看原文本刊更多论文

HFGlobalFormer：当高频恢复满足压缩图像Deraindrop的全局上下文建模时

当传输介质和压缩退化相互交织时，新的挑战就出现了。本研究解决了从压缩图像中去除雨滴的问题，在压缩图像中，雨滴模糊了大面积的背景，压缩导致高频（HF）信息的丢失。前者的恢复需要全局上下文信息，而后者需要高频细节的指导，导致在设计现有方法时，在利用这两种类型的信息时发生冲突。为了解决这个问题，我们提出了一种新的变压器架构，利用注意机制和高频友好设计的优势，在框架、组件和模块级别有效地恢复压缩后的雨滴图像。具体而言，在框架层面，我们将相对位置多头自注意层和卷积层整合到所提出的低频变压器（LHFT）中，其中前者捕获全局上下文信息，后者专注于高频信息。它们的结合有效地解决了混合降解问题。在组件级别，我们利用具有零均值核的高频深度卷积（HFDC）来提高提取高频特征的能力，从Prewitt和Sobel算子等典型高频滤波器中获得灵感。最后，在模块层面，我们引入了一个低-高关注模块（LHAM）来自适应地分配低频和高频的重要性，以实现有效的融合。我们建立了jpeg压缩雨滴图像数据集，并在不同的压缩率下进行了大量的实验。实验结果表明，该方法在不增加计算成本的情况下优于当前的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.