{"title":"基于条件扩散的生成概率熵模型在学习图像压缩中的应用","authors":"Maida Cao;Wenrui Dai;Shaohui Li;Chenglin Li;Junni Zou;Weisheng Hu;Hongkai Xiong","doi":"10.1109/TCSVT.2025.3551780","DOIUrl":null,"url":null,"abstract":"Entropy modeling is the core component of learned image compression (LIC) that models the distribution of latent representation learned from input images via neural networks for bit-rate estimation. However, existing entropy models employ presumed parameterized distributions such as Gaussian models and are limited for the learned latent representation characterized by complex distributions. To address this problem, in this paper, we for the first time achieve generative probabilistic entropy modeling of latent representation based on conditional diffusion models. Specifically, we propose a conditional diffusion-based probabilistic entropy model (CDPEM) to parameterize the latent representation with distributions of arbitrary forms that are generated by well designed training-test consistent denoising diffusion implicit model (TC-DDIM) without introducing any presumption. TC-DDIM is designed to leverage ancestral sampling to gradually approximate the distribution of latent representation with guaranteed consistency in generation for training and test. Furthermore, we develop a hierarchical spatial-channel context model to incorporate with TC-DDIM to sufficiently exploit spatial correlations with the approximate contextual information produced by ancestral sampling and channel-wise correlations using channel-wise information aggregation with reweighted training loss. Experimental results demonstrate that the proposed entropy model achieves state-of-the-art performance on the Kodak, CLIC, and Tecnick datasets compared to existing LIC methods. Remarkably, when incorporated with recent baselines, the proposed model outperforms latest VVC standard by an evident gain in R-D performance.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9443-9459"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generative Probabilistic Entropy Modeling With Conditional Diffusion for Learned Image Compression\",\"authors\":\"Maida Cao;Wenrui Dai;Shaohui Li;Chenglin Li;Junni Zou;Weisheng Hu;Hongkai Xiong\",\"doi\":\"10.1109/TCSVT.2025.3551780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entropy modeling is the core component of learned image compression (LIC) that models the distribution of latent representation learned from input images via neural networks for bit-rate estimation. However, existing entropy models employ presumed parameterized distributions such as Gaussian models and are limited for the learned latent representation characterized by complex distributions. To address this problem, in this paper, we for the first time achieve generative probabilistic entropy modeling of latent representation based on conditional diffusion models. Specifically, we propose a conditional diffusion-based probabilistic entropy model (CDPEM) to parameterize the latent representation with distributions of arbitrary forms that are generated by well designed training-test consistent denoising diffusion implicit model (TC-DDIM) without introducing any presumption. TC-DDIM is designed to leverage ancestral sampling to gradually approximate the distribution of latent representation with guaranteed consistency in generation for training and test. Furthermore, we develop a hierarchical spatial-channel context model to incorporate with TC-DDIM to sufficiently exploit spatial correlations with the approximate contextual information produced by ancestral sampling and channel-wise correlations using channel-wise information aggregation with reweighted training loss. Experimental results demonstrate that the proposed entropy model achieves state-of-the-art performance on the Kodak, CLIC, and Tecnick datasets compared to existing LIC methods. Remarkably, when incorporated with recent baselines, the proposed model outperforms latest VVC standard by an evident gain in R-D performance.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9443-9459\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10929015/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10929015/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Generative Probabilistic Entropy Modeling With Conditional Diffusion for Learned Image Compression
Entropy modeling is the core component of learned image compression (LIC) that models the distribution of latent representation learned from input images via neural networks for bit-rate estimation. However, existing entropy models employ presumed parameterized distributions such as Gaussian models and are limited for the learned latent representation characterized by complex distributions. To address this problem, in this paper, we for the first time achieve generative probabilistic entropy modeling of latent representation based on conditional diffusion models. Specifically, we propose a conditional diffusion-based probabilistic entropy model (CDPEM) to parameterize the latent representation with distributions of arbitrary forms that are generated by well designed training-test consistent denoising diffusion implicit model (TC-DDIM) without introducing any presumption. TC-DDIM is designed to leverage ancestral sampling to gradually approximate the distribution of latent representation with guaranteed consistency in generation for training and test. Furthermore, we develop a hierarchical spatial-channel context model to incorporate with TC-DDIM to sufficiently exploit spatial correlations with the approximate contextual information produced by ancestral sampling and channel-wise correlations using channel-wise information aggregation with reweighted training loss. Experimental results demonstrate that the proposed entropy model achieves state-of-the-art performance on the Kodak, CLIC, and Tecnick datasets compared to existing LIC methods. Remarkably, when incorporated with recent baselines, the proposed model outperforms latest VVC standard by an evident gain in R-D performance.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.