LLIC：利用自适应权重的大接收场变换编码技术实现学习图像压缩

IF 8.4 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2024-06-19 DOI:10.1109/TMM.2024.3416831

Wei Jiang;Peirong Ning;Jiayu Yang;Yongqi Zhai;Feng Gao;Ronggang Wang

{"title":"LLIC：利用自适应权重的大接收场变换编码技术实现学习图像压缩","authors":"Wei Jiang;Peirong Ning;Jiayu Yang;Yongqi Zhai;Feng Gao;Ronggang Wang","doi":"10.1109/TMM.2024.3416831","DOIUrl":null,"url":null,"abstract":"The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the \n<italic>first</i>\n time in the learned image compression community, we introduce \n<italic>a few</i>\n large kernel-based depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter point-wise interactions. Our investigation extends to refined training methods that unlock the full potential of these large kernels. Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner. To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, and LLIC-TCM. Extensive experiments demonstrate that our proposed LLIC models have significant improvements over the corresponding baselines and reduce the BD-Rate by \n<inline-formula><tex-math>$9.49\\%, 9.47\\%,\\;\\text{and}\\; 10.94\\%$</tex-math></inline-formula>\n on Kodak over VTM-17.0 Intra, respectively. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"10937-10951"},"PeriodicalIF":8.4000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLIC: Large Receptive Field Transform Coding With Adaptive Weights for Learned Image Compression\",\"authors\":\"Wei Jiang;Peirong Ning;Jiayu Yang;Yongqi Zhai;Feng Gao;Ronggang Wang\",\"doi\":\"10.1109/TMM.2024.3416831\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the \\n<italic>first</i>\\n time in the learned image compression community, we introduce \\n<italic>a few</i>\\n large kernel-based depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter point-wise interactions. Our investigation extends to refined training methods that unlock the full potential of these large kernels. Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner. To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, and LLIC-TCM. Extensive experiments demonstrate that our proposed LLIC models have significant improvements over the corresponding baselines and reduce the BD-Rate by \\n<inline-formula><tex-math>$9.49\\\\%, 9.47\\\\%,\\\\;\\\\text{and}\\\\; 10.94\\\\%$</tex-math></inline-formula>\\n on Kodak over VTM-17.0 Intra, respectively. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"26 \",\"pages\":\"10937-10951\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2024-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10564141/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10564141/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

有效感受野（ERF）在变换编码中起着重要作用，它决定了在变换过程中可以去除多少冗余，以及在反变换过程中可以利用多少空间先验来合成纹理。现有方法依赖于小核的堆叠，其 ERF 仍然不够大，或者依赖于严重的非局部关注机制，这限制了高分辨率图像编码的潜力。为解决这一问题，我们提出了用于学习图像压缩的具有自适应权重的大接收场变换编码（LLIC）。具体来说，在学习图像压缩领域，我们首次引入了一些基于大核的深度卷积，以减少更多冗余，同时保持适度的复杂性。由于图像的多样性范围很广，我们进一步提出了一种机制，通过权重的自条件生成来增强卷积的适应性。大型内核与非线性嵌入和门机制合作，以获得更好的表现力和更轻的点上交互。我们的研究扩展到了精炼的训练方法，以充分释放这些大型内核的潜力。此外，为了促进更动态的信道间交互，我们引入了一种自适应信道比特分配策略，该策略以自条件方式自主生成信道重要性因子。为了证明所提议的变换编码的有效性，我们将熵模型与现有的变换方法进行了比较，得到了 LLIC-STF、LLIC-ELIC 和 LLIC-TCM 模型。广泛的实验证明，我们提出的 LLIC 模型比相应的基线有显著的改进，在柯达公司的 VTM-17.0 Intra 上，BD-Rate 分别降低了 9.49%、9.47%、10.94%。我们的 LLIC 模型实现了最先进的性能，并在性能和复杂性之间进行了更好的权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LLIC: Large Receptive Field Transform Coding With Adaptive Weights for Learned Image Compression

The effective receptive field (ERF) plays an important role in transform coding, which determines how much redundancy can be removed during transform and how many spatial priors can be utilized to synthesize textures during inverse transform. Existing methods rely on stacks of small kernels, whose ERFs remain insufficiently large, or heavy non-local attention mechanisms, which limit the potential of high-resolution image coding. To tackle this issue, we propose Large Receptive Field Transform Coding with Adaptive Weights for Learned Image Compression (LLIC). Specifically, for the first time in the learned image compression community, we introduce a few large kernel-based depth-wise convolutions to reduce more redundancy while maintaining modest complexity. Due to the wide range of image diversity, we further propose a mechanism to augment convolution adaptability through the self-conditioned generation of weights. The large kernels cooperate with non-linear embedding and gate mechanisms for better expressiveness and lighter point-wise interactions. Our investigation extends to refined training methods that unlock the full potential of these large kernels. Moreover, to promote more dynamic inter-channel interactions, we introduce an adaptive channel-wise bit allocation strategy that autonomously generates channel importance factors in a self-conditioned manner. To demonstrate the effectiveness of the proposed transform coding, we align the entropy model to compare with existing transform methods and obtain models LLIC-STF, LLIC-ELIC, and LLIC-TCM. Extensive experiments demonstrate that our proposed LLIC models have significant improvements over the corresponding baselines and reduce the BD-Rate by

$9.49\%, 9.47\%,\;\text{and}\; 10.94\%$

on Kodak over VTM-17.0 Intra, respectively. Our LLIC models achieve state-of-the-art performances and better trade-offs between performance and complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.

LLIC： 利用自适应权重的大接收场变换编码技术实现学习图像压缩

摘要

LLIC：利用自适应权重的大接收场变换编码技术实现学习图像压缩