RDDM:用于学习图像压缩增强的速率失真引导扩散模型

IF 3.8 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Sanxin Jiang;Jiro Katto;Heming Sun
{"title":"RDDM:用于学习图像压缩增强的速率失真引导扩散模型","authors":"Sanxin Jiang;Jiro Katto;Heming Sun","doi":"10.1109/JETCAS.2025.3563228","DOIUrl":null,"url":null,"abstract":"Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM’s performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as <inline-formula> <tex-math>$\\text{RDDM}^{\\star }$ </tex-math></inline-formula>. The experimental results indicate that both RDDM and <inline-formula> <tex-math>$\\text{RDDM}^{\\star }$ </tex-math></inline-formula> can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 2","pages":"186-199"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RDDM: A Rate-Distortion Guided Diffusion Model for Learned Image Compression Enhancement\",\"authors\":\"Sanxin Jiang;Jiro Katto;Heming Sun\",\"doi\":\"10.1109/JETCAS.2025.3563228\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM’s performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as <inline-formula> <tex-math>$\\\\text{RDDM}^{\\\\star }$ </tex-math></inline-formula>. The experimental results indicate that both RDDM and <inline-formula> <tex-math>$\\\\text{RDDM}^{\\\\star }$ </tex-math></inline-formula> can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent.\",\"PeriodicalId\":48827,\"journal\":{\"name\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"volume\":\"15 2\",\"pages\":\"186-199\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Emerging and Selected Topics in Circuits and Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10973607/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10973607/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

目前,去噪扩散概率模型(DDPM)在各种图像生成任务中取得了显著的成功,但其在图像压缩中的应用,特别是在学习图像压缩(LIC)中的应用非常有限。在这项研究中,我们引入了一种速率失真(RD)引导的扩散模型,简称RDDM,以提高LIC的性能。在RDDM中,LIC被视为受RD约束的有损编解码函数,通过编解码操作将输入图像分成重构图像和残差图像两部分。RDDM的构建主要基于两点。首先,RDDM将扩散模型视为图像结构和纹理的存储库,使用广泛的现实世界数据集构建。在RD约束的指导下,从这些存储库中提取并利用必要的结构和纹理先验来恢复输入图像。其次,RDDM基于重构图像及其编解码功能,采用贝叶斯网络逐步推断输入图像。此外,我们的研究表明,当其编解码功能与重建图像不匹配时,RDDM的性能会下降。然而,使用最高比特率编解码器功能可以最大限度地减少这种性能下降。生成的模型被称为$\text{RDDM}^{\star}$。实验结果表明,RDDM和$\text{RDDM}^{\star}$都可以应用于各种结构的lic,如CNN、Transformer及其混合结构。它们可以显著提高编解码器的保真度,同时在一定程度上保持甚至增强感知质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RDDM: A Rate-Distortion Guided Diffusion Model for Learned Image Compression Enhancement
Currently, denoising diffusion probability models (DDPM) have achieved significant success in various image generation tasks, but their application in image compression, especially in the context of learned image compression (LIC), is quite limited. In this study, we introduce a rate-distortion (RD) guided diffusion model, referred to as RDDM, to enhance the performance of LIC. In RDDM, LIC is treated as a lossy codec function constrained by RD, dividing the input image into two parts through encoding and decoding operations: the reconstructed image and the residual image. The construction of RDDM is primarily based on two points. First, RDDM treats diffusion models as repositories of image structures and textures, built using extensive real-world datasets. Under the guidance of RD constraints, it extracts and utilizes the necessary structural and textural priors from these repositories to restore the input image. Second, RDDM employs a Bayesian network to progressively infer the input image based on the reconstructed image and its codec function. Additionally, our research reveals that RDDM’s performance declines when its codec function does not match the reconstructed image. However, using the highest bitrate codec function minimizes this performance drop. The resulting model is referred to as $\text{RDDM}^{\star }$ . The experimental results indicate that both RDDM and $\text{RDDM}^{\star }$ can be applied to various architectures of LICs, such as CNN, Transformer, and their hybrid. They can significantly improve the fidelity of these codecs while maintaining or even enhancing perceptual quality to some extent.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.50
自引率
2.20%
发文量
86
期刊介绍: The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信