Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition

2021 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2021-10-01 DOI:10.1109/ICCV48922.2021.01441

Ge Gao, P. You, Rong Pan, Shunyuan Han, Yuanyuan Zhang, Yuchao Dai, Ho-Jun Lee

{"title":"Neural Image Compression via Attentional Multi-scale Back Projection and Frequency Decomposition","authors":"Ge Gao, P. You, Rong Pan, Shunyuan Han, Yuanyuan Zhang, Yuchao Dai, Ho-Jun Lee","doi":"10.1109/ICCV48922.2021.01441","DOIUrl":null,"url":null,"abstract":"In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"28 1","pages":"14657-14666"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"45","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV48922.2021.01441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 45

Abstract

In recent years, neural image compression emerges as a rapidly developing topic in computer vision, where the state-of-the-art approaches now exhibit superior compression performance than their conventional counterparts. Despite the great progress, current methods still have limitations in preserving fine spatial details for optimal reconstruction, especially at low compression rates. We make three contributions in tackling this issue. First, we develop a novel back projection method with attentional and multi-scale feature fusion for augmented representation power. Our back projection method recalibrates the current estimation by establishing feedback connections between high-level and low-level attributes in an attentional and discriminative manner. Second, we propose to decompose the input image and separately process the distinct frequency components, whose derived latents are recombined using a novel dual attention module, so that details inside regions of interest could be explicitly manipulated. Third, we propose a novel training scheme for reducing the latent rounding residual. Experimental results show that, when measured in PSNR, our model reduces BD-rate by 9.88% and 10.32% over the state-of-the-art method, and 4.12% and 4.32% over the latest coding standard Versatile Video Coding (VVC) on the Kodak and CLIC2020 Professional Validation dataset, respectively. Our approach also produces more visually pleasant images when optimized for MS-SSIM. The significant improvement upon existing methods shows the effectiveness of our method in preserving and remedying spatial information for enhanced compression quality.

查看原文本刊更多论文

基于注意多尺度反投影和频率分解的神经图像压缩

近年来，神经图像压缩成为计算机视觉领域的一个快速发展的课题，其中最先进的方法现在表现出比传统方法更好的压缩性能。尽管取得了很大的进步，但目前的方法在保留最佳重建的精细空间细节方面仍然存在局限性，特别是在低压缩率下。我们在解决这个问题上有三点贡献。首先，我们开发了一种新的基于注意力和多尺度特征融合的增强表征能力的反向投影方法。我们的反向投影方法通过以注意和判别的方式在高级别和低级别属性之间建立反馈连接来重新校准当前的估计。其次，我们提出对输入图像进行分解并分别处理不同的频率分量，并使用一种新的双注意模块对其衍生的电位进行重组，从而可以明确地操纵感兴趣区域内的细节。第三，我们提出了一种新的训练方案来减少潜在的舍入残差。实验结果表明，当以PSNR测量时，我们的模型比最先进的方法分别降低了9.88%和10.32%，比最新编码标准通用视频编码(VVC)在柯达和CLIC2020专业验证数据集上分别降低了4.12%和4.32%。当针对MS-SSIM进行优化时，我们的方法也会产生更令人赏心悦目的图像。在现有方法的基础上进行了显著改进，表明该方法在保存和修复空间信息以提高压缩质量方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量