Dynamic kernel-based adaptive spatial aggregation for learned image compression

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-04-15 DOI:10.1016/j.jvcir.2025.104456

Huairui Wang , Nianxiang Fu , Zhenzhong Chen , Shan Liu

{"title":"Dynamic kernel-based adaptive spatial aggregation for learned image compression","authors":"Huairui Wang , Nianxiang Fu , Zhenzhong Chen , Shan Liu","doi":"10.1016/j.jvcir.2025.104456","DOIUrl":null,"url":null,"abstract":"<div><div>Learned image compression methods have shown remarkable performance and expansion potential compared to traditional codecs. Currently, there are two mainstream image compression frameworks: one uses stacked convolution and other uses window-based self-attention for transform coding, most of which aggregate valuable dependencies in a fixed spatial range. In this paper, we focus on extending content-adaptive aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valuable information with dynamic sampling convolution to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, considering the coarse hyper prior, the channel-wise, and the spatial context, we formulate a generalized entropy model. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive side information context. Furthermore, we propose an asymmetric sparse entropy model according to the investigation of the spatial and variance characteristics of the grouped latents. The proposed entropy model can facilitate entropy coding to reduce statistical redundancy while maintaining inference efficiency. Experimental results demonstrate that our method achieves superior rate–distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104456"},"PeriodicalIF":2.6000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325000707","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Learned image compression methods have shown remarkable performance and expansion potential compared to traditional codecs. Currently, there are two mainstream image compression frameworks: one uses stacked convolution and other uses window-based self-attention for transform coding, most of which aggregate valuable dependencies in a fixed spatial range. In this paper, we focus on extending content-adaptive aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valuable information with dynamic sampling convolution to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, considering the coarse hyper prior, the channel-wise, and the spatial context, we formulate a generalized entropy model. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive side information context. Furthermore, we propose an asymmetric sparse entropy model according to the investigation of the spatial and variance characteristics of the grouped latents. The proposed entropy model can facilitate entropy coding to reduce statistical redundancy while maintaining inference efficiency. Experimental results demonstrate that our method achieves superior rate–distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.

查看原文本刊更多论文

基于动态核的自适应空间聚合学习图像压缩

与传统的编解码器相比，学习图像压缩方法显示出显著的性能和扩展潜力。目前，主流的图像压缩框架有两种：一种是使用堆叠卷积，另一种是使用基于窗口的自关注进行变换编码，它们大多将有价值的依赖关系聚集在固定的空间范围内。本文着重于扩展内容自适应聚合能力，提出了一种基于动态核的转换编码方法。提出的自适应聚合方法通过动态采样卷积生成核偏移来捕获有价值的信息。该方法采用自适应聚合策略和权值共享机制，在模型复杂度可接受的情况下，实现了较好的转换能力。此外，考虑到粗先验、信道方向和空间背景，我们建立了广义熵模型。在此基础上，引入超先验动态核，生成更具表现力的侧信息上下文。在此基础上，通过对分组电位的空间和方差特征的研究，提出了一种非对称稀疏熵模型。所提出的熵模型有助于熵编码在保持推理效率的同时减少统计冗余。实验结果表明，与最先进的基于学习的方法相比，我们的方法在三个基准上取得了更好的率失真性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.