基于条件扩散的生成概率熵模型在学习图像压缩中的应用

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-18 DOI:10.1109/TCSVT.2025.3551780

Maida Cao;Wenrui Dai;Shaohui Li;Chenglin Li;Junni Zou;Weisheng Hu;Hongkai Xiong

{"title":"基于条件扩散的生成概率熵模型在学习图像压缩中的应用","authors":"Maida Cao;Wenrui Dai;Shaohui Li;Chenglin Li;Junni Zou;Weisheng Hu;Hongkai Xiong","doi":"10.1109/TCSVT.2025.3551780","DOIUrl":null,"url":null,"abstract":"Entropy modeling is the core component of learned image compression (LIC) that models the distribution of latent representation learned from input images via neural networks for bit-rate estimation. However, existing entropy models employ presumed parameterized distributions such as Gaussian models and are limited for the learned latent representation characterized by complex distributions. To address this problem, in this paper, we for the first time achieve generative probabilistic entropy modeling of latent representation based on conditional diffusion models. Specifically, we propose a conditional diffusion-based probabilistic entropy model (CDPEM) to parameterize the latent representation with distributions of arbitrary forms that are generated by well designed training-test consistent denoising diffusion implicit model (TC-DDIM) without introducing any presumption. TC-DDIM is designed to leverage ancestral sampling to gradually approximate the distribution of latent representation with guaranteed consistency in generation for training and test. Furthermore, we develop a hierarchical spatial-channel context model to incorporate with TC-DDIM to sufficiently exploit spatial correlations with the approximate contextual information produced by ancestral sampling and channel-wise correlations using channel-wise information aggregation with reweighted training loss. Experimental results demonstrate that the proposed entropy model achieves state-of-the-art performance on the Kodak, CLIC, and Tecnick datasets compared to existing LIC methods. Remarkably, when incorporated with recent baselines, the proposed model outperforms latest VVC standard by an evident gain in R-D performance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9443-9459"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generative Probabilistic Entropy Modeling With Conditional Diffusion for Learned Image Compression\",\"authors\":\"Maida Cao;Wenrui Dai;Shaohui Li;Chenglin Li;Junni Zou;Weisheng Hu;Hongkai Xiong\",\"doi\":\"10.1109/TCSVT.2025.3551780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entropy modeling is the core component of learned image compression (LIC) that models the distribution of latent representation learned from input images via neural networks for bit-rate estimation. However, existing entropy models employ presumed parameterized distributions such as Gaussian models and are limited for the learned latent representation characterized by complex distributions. To address this problem, in this paper, we for the first time achieve generative probabilistic entropy modeling of latent representation based on conditional diffusion models. Specifically, we propose a conditional diffusion-based probabilistic entropy model (CDPEM) to parameterize the latent representation with distributions of arbitrary forms that are generated by well designed training-test consistent denoising diffusion implicit model (TC-DDIM) without introducing any presumption. TC-DDIM is designed to leverage ancestral sampling to gradually approximate the distribution of latent representation with guaranteed consistency in generation for training and test. Furthermore, we develop a hierarchical spatial-channel context model to incorporate with TC-DDIM to sufficiently exploit spatial correlations with the approximate contextual information produced by ancestral sampling and channel-wise correlations using channel-wise information aggregation with reweighted training loss. Experimental results demonstrate that the proposed entropy model achieves state-of-the-art performance on the Kodak, CLIC, and Tecnick datasets compared to existing LIC methods. Remarkably, when incorporated with recent baselines, the proposed model outperforms latest VVC standard by an evident gain in R-D performance.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9443-9459\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10929015/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10929015/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

熵建模是学习图像压缩（LIC）的核心组成部分，它通过神经网络对从输入图像中学习到的潜在表示的分布进行建模，以进行比特率估计。然而，现有的熵模型采用高斯模型等假定的参数化分布，对于复杂分布特征的学习潜表示具有局限性。为了解决这一问题，本文首次实现了基于条件扩散模型的潜在表示的生成概率熵建模。具体而言，我们提出了一种基于条件扩散的概率熵模型（CDPEM），在不引入任何假设的情况下，将潜在表示参数化为由精心设计的训练-测试一致去噪扩散隐式模型（TC-DDIM）生成的任意形式的分布。TC-DDIM是利用祖先抽样逐步逼近潜在表示的分布，保证生成的一致性，用于训练和测试。此外，我们开发了一个分层的空间信道上下文模型，将其与TC-DDIM结合起来，充分利用由祖先采样产生的近似上下文信息的空间相关性，以及使用带有重加权训练损失的信道信息聚合的信道相关。实验结果表明，与现有的LIC方法相比，所提出的熵模型在柯达、CLIC和Tecnick数据集上达到了最先进的性能。值得注意的是，当与最近的基线相结合时，所提出的模型在研发性能方面明显优于最新的VVC标准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generative Probabilistic Entropy Modeling With Conditional Diffusion for Learned Image Compression

Entropy modeling is the core component of learned image compression (LIC) that models the distribution of latent representation learned from input images via neural networks for bit-rate estimation. However, existing entropy models employ presumed parameterized distributions such as Gaussian models and are limited for the learned latent representation characterized by complex distributions. To address this problem, in this paper, we for the first time achieve generative probabilistic entropy modeling of latent representation based on conditional diffusion models. Specifically, we propose a conditional diffusion-based probabilistic entropy model (CDPEM) to parameterize the latent representation with distributions of arbitrary forms that are generated by well designed training-test consistent denoising diffusion implicit model (TC-DDIM) without introducing any presumption. TC-DDIM is designed to leverage ancestral sampling to gradually approximate the distribution of latent representation with guaranteed consistency in generation for training and test. Furthermore, we develop a hierarchical spatial-channel context model to incorporate with TC-DDIM to sufficiently exploit spatial correlations with the approximate contextual information produced by ancestral sampling and channel-wise correlations using channel-wise information aggregation with reweighted training loss. Experimental results demonstrate that the proposed entropy model achieves state-of-the-art performance on the Kodak, CLIC, and Tecnick datasets compared to existing LIC methods. Remarkably, when incorporated with recent baselines, the proposed model outperforms latest VVC standard by an evident gain in R-D performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.