{"title":"HFGlobalFormer: When High-Frequency Recovery Meets Global Context Modeling for Compressed Image Deraindrop","authors":"Rongqun Lin;Wenhan Yang;Baoliang Chen;Pingping Zhang;Yue Liu;Shiqi Wang;Sam Kwong","doi":"10.1109/TMM.2024.3521831","DOIUrl":null,"url":null,"abstract":"When transmission medium and compression degradation are intertwined, new challenges emerge. This study addresses the problem of raindrop removal from compressed images, where raindrops obscure large areas of the background and compression leads to the loss of high-frequency (HF) information. The restoration of the former requires global contextual information, while the latter necessitates guidance for high-frequency details, resulting in a conflict in utilizing these two types of information when designing existing methods. To address this issue, we propose a novel transformer architecture that leverages the advantages of attention mechanism and HF-friendly design to effectively restore the compressed raindrop images at the framework, component, and module levels. Specifically, at the framework level, we integrate relative position multi-head self-attention and convolutional layers into the proposed low-high-frequency transformer (LHFT), where the former captures global contextual information and the latter focuses on high-frequency information. Their combination effectively resolves the issue of mixed degradation. At the component level, we utilize high-frequency depth-wise convolution (HFDC) with zero-mean kernels to improve the capability to extract high-frequency features, drawing inspiration from typical high-frequency filters like Prewitt and Sobel operators. Finally, at the module level, we introduce a low-high-attention module (LHAM) to adaptively allocate the importance of low and high frequencies along channels for effective fusion. We establish the JPEG-compressed raindrop image dataset and conduct extensive experiments on different compression rates. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods without increasing computational costs.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"2070-2082"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814691/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
When transmission medium and compression degradation are intertwined, new challenges emerge. This study addresses the problem of raindrop removal from compressed images, where raindrops obscure large areas of the background and compression leads to the loss of high-frequency (HF) information. The restoration of the former requires global contextual information, while the latter necessitates guidance for high-frequency details, resulting in a conflict in utilizing these two types of information when designing existing methods. To address this issue, we propose a novel transformer architecture that leverages the advantages of attention mechanism and HF-friendly design to effectively restore the compressed raindrop images at the framework, component, and module levels. Specifically, at the framework level, we integrate relative position multi-head self-attention and convolutional layers into the proposed low-high-frequency transformer (LHFT), where the former captures global contextual information and the latter focuses on high-frequency information. Their combination effectively resolves the issue of mixed degradation. At the component level, we utilize high-frequency depth-wise convolution (HFDC) with zero-mean kernels to improve the capability to extract high-frequency features, drawing inspiration from typical high-frequency filters like Prewitt and Sobel operators. Finally, at the module level, we introduce a low-high-attention module (LHAM) to adaptively allocate the importance of low and high frequencies along channels for effective fusion. We establish the JPEG-compressed raindrop image dataset and conduct extensive experiments on different compression rates. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods without increasing computational costs.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.