率失真优化的JPEG压缩深度预处理

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-14 DOI:10.1109/TCSVT.2025.3550872

Fan Ye;Bojun Liu;Li Li;Dong Liu

{"title":"率失真优化的JPEG压缩深度预处理","authors":"Fan Ye;Bojun Liu;Li Li;Dong Liu","doi":"10.1109/TCSVT.2025.3550872","DOIUrl":null,"url":null,"abstract":"JPEG is daily used for compressing natural images, while the compressed images often contain visually annoying artifacts especially at low rates. To reduce the compression artifacts, it has been proposed to preprocess an image before the JPEG compression with the help of deep learning, which maintains the standard compliance. However, the existing methods were not fully justified from the rate-distortion optimization perspective. We address this limitation and propose a truly rate-distortion-optimized deep preprocessing method for JPEG compression. We decompose a rate-distortion cost into three parts: rate, distortion, and Lagrangian multiplier. First, we design a rate estimation network and propose to train the network to estimate the JPEG compression rate. Second, we propose to estimate the actual end-to-end distortion (between original and reconstructed images) with a differentiable JPEG simulator, where we specifically design an adaptive discrete cosine transform (DCT) domain masking algorithm. Third, we propose to estimate the actual content-dependent Lagrangian multipliers to combine rate and distortion into a joint loss function that drives the training of the preprocessing network. Our method makes no change to the JPEG encoder and decoder and supports any differentiable distortion measure (e.g. MSE, MS-SSIM, LPIPS). On the Kodak dataset, our method achieves on average 7.59% BD-rate reduction compared to the JPEG baseline when using MSE. With per-image optimization for LPIPS, our method achieves as high as 38.65% BD-rate reduction, and produces high-quality reconstructed images with much less artifacts.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 8","pages":"8330-8343"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rate-Distortion-Optimized Deep Preprocessing for JPEG Compression\",\"authors\":\"Fan Ye;Bojun Liu;Li Li;Dong Liu\",\"doi\":\"10.1109/TCSVT.2025.3550872\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"JPEG is daily used for compressing natural images, while the compressed images often contain visually annoying artifacts especially at low rates. To reduce the compression artifacts, it has been proposed to preprocess an image before the JPEG compression with the help of deep learning, which maintains the standard compliance. However, the existing methods were not fully justified from the rate-distortion optimization perspective. We address this limitation and propose a truly rate-distortion-optimized deep preprocessing method for JPEG compression. We decompose a rate-distortion cost into three parts: rate, distortion, and Lagrangian multiplier. First, we design a rate estimation network and propose to train the network to estimate the JPEG compression rate. Second, we propose to estimate the actual end-to-end distortion (between original and reconstructed images) with a differentiable JPEG simulator, where we specifically design an adaptive discrete cosine transform (DCT) domain masking algorithm. Third, we propose to estimate the actual content-dependent Lagrangian multipliers to combine rate and distortion into a joint loss function that drives the training of the preprocessing network. Our method makes no change to the JPEG encoder and decoder and supports any differentiable distortion measure (e.g. MSE, MS-SSIM, LPIPS). On the Kodak dataset, our method achieves on average 7.59% BD-rate reduction compared to the JPEG baseline when using MSE. With per-image optimization for LPIPS, our method achieves as high as 38.65% BD-rate reduction, and produces high-quality reconstructed images with much less artifacts.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 8\",\"pages\":\"8330-8343\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10925482/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10925482/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

JPEG每天用于压缩自然图像，而压缩后的图像通常包含视觉上令人讨厌的伪影，特别是在低速率下。为了减少压缩伪影，提出在JPEG压缩前利用深度学习对图像进行预处理，以保持对标准的遵从性。然而，从率畸变优化的角度来看，现有的方法并不完全合理。我们解决了这一限制，并提出了一种真正的率失真优化的JPEG压缩深度预处理方法。我们将汇率扭曲成本分解为三个部分：汇率、扭曲和拉格朗日乘数。首先，我们设计了一个速率估计网络，并提出对网络进行训练来估计JPEG压缩率。其次，我们建议用一个可微的JPEG模拟器来估计实际的端到端失真（原始图像和重建图像之间），其中我们特别设计了一个自适应离散余弦变换（DCT）域掩蔽算法。第三，我们建议估计实际内容相关的拉格朗日乘子，将速率和失真组合成一个联合损失函数，驱动预处理网络的训练。我们的方法不改变JPEG编码器和解码器，并支持任何可微分失真测量（例如MSE， MS-SSIM， LPIPS）。在柯达数据集上，与使用MSE时的JPEG基线相比，我们的方法实现了平均7.59%的bd率降低。通过LPIPS的每幅图像优化，我们的方法实现了高达38.65%的bd率降低，并产生了具有更少伪影的高质量重建图像。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rate-Distortion-Optimized Deep Preprocessing for JPEG Compression

JPEG is daily used for compressing natural images, while the compressed images often contain visually annoying artifacts especially at low rates. To reduce the compression artifacts, it has been proposed to preprocess an image before the JPEG compression with the help of deep learning, which maintains the standard compliance. However, the existing methods were not fully justified from the rate-distortion optimization perspective. We address this limitation and propose a truly rate-distortion-optimized deep preprocessing method for JPEG compression. We decompose a rate-distortion cost into three parts: rate, distortion, and Lagrangian multiplier. First, we design a rate estimation network and propose to train the network to estimate the JPEG compression rate. Second, we propose to estimate the actual end-to-end distortion (between original and reconstructed images) with a differentiable JPEG simulator, where we specifically design an adaptive discrete cosine transform (DCT) domain masking algorithm. Third, we propose to estimate the actual content-dependent Lagrangian multipliers to combine rate and distortion into a joint loss function that drives the training of the preprocessing network. Our method makes no change to the JPEG encoder and decoder and supports any differentiable distortion measure (e.g. MSE, MS-SSIM, LPIPS). On the Kodak dataset, our method achieves on average 7.59% BD-rate reduction compared to the JPEG baseline when using MSE. With per-image optimization for LPIPS, our method achieves as high as 38.65% BD-rate reduction, and produces high-quality reconstructed images with much less artifacts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.