On Pre-chewing Compression Degradation for Learned Video Compression

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2022-12-13 DOI:10.1109/VCIP56404.2022.10008873

Man M. Ho, Heming Sun, Zhiqiang Zhang, Jinjia Zhou

{"title":"On Pre-chewing Compression Degradation for Learned Video Compression","authors":"Man M. Ho, Heming Sun, Zhiqiang Zhang, Jinjia Zhou","doi":"10.1109/VCIP56404.2022.10008873","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence (AI) needs huge amounts of data, and so does Learned Restoration for Video Compression. There are two main problems regarding training data. 1) Preparing training compression degradation using a video codec (e.g., Versatile Video Coding - VVC) costs a considerable resource. Significantly, the more Quantization Parameters (QPs) we compress with, the more coding time and storage are required. 2) The common way of training a newly initialized Restoration Network on pure compression degradation at the beginning is not effective. To solve these problems, we propose a Degradation Network to pre-chew (generalize and learn to synthesize) the real compression degradation, then present a hybrid training scheme that allows a Restoration Network to be trained on unlimited videos without compression. Concretely, we propose a QP-wise Degradation Network to learn how to compress video frames like VVC in real-time and can transform the degradation output between QPs linearly. The real compression degradation is thus pre-chewed as our Degradation Network can synthesize the more generalized degradation for a newly initialized Restoration Network to learn easier. To diversify training video content without compression and avoid overfitting, we design a Training Framework for Semi-Compression Degradation (TF-SCD) to train our model on many fake compressed videos together with real compressed videos. As a result, the Restoration Network can quickly jump to the near-best optimum at the beginning of training, proving our promising scheme of using pre-chewed data for the very first steps of training. In other words, a newly initialized Learned Video Compression can be warmed up efficiently but effectively with our pre-trained Degradation Network. Besides, our proposed TF-SCD can further enhance the restoration performance in a specific range of QPs and provide a better generalization about QPs compared with the common way of training a restoration model. Our work is available at https://minhmanho.github.io/prechewing_degradation.","PeriodicalId":269379,"journal":{"name":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VCIP56404.2022.10008873","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Artificial Intelligence (AI) needs huge amounts of data, and so does Learned Restoration for Video Compression. There are two main problems regarding training data. 1) Preparing training compression degradation using a video codec (e.g., Versatile Video Coding - VVC) costs a considerable resource. Significantly, the more Quantization Parameters (QPs) we compress with, the more coding time and storage are required. 2) The common way of training a newly initialized Restoration Network on pure compression degradation at the beginning is not effective. To solve these problems, we propose a Degradation Network to pre-chew (generalize and learn to synthesize) the real compression degradation, then present a hybrid training scheme that allows a Restoration Network to be trained on unlimited videos without compression. Concretely, we propose a QP-wise Degradation Network to learn how to compress video frames like VVC in real-time and can transform the degradation output between QPs linearly. The real compression degradation is thus pre-chewed as our Degradation Network can synthesize the more generalized degradation for a newly initialized Restoration Network to learn easier. To diversify training video content without compression and avoid overfitting, we design a Training Framework for Semi-Compression Degradation (TF-SCD) to train our model on many fake compressed videos together with real compressed videos. As a result, the Restoration Network can quickly jump to the near-best optimum at the beginning of training, proving our promising scheme of using pre-chewed data for the very first steps of training. In other words, a newly initialized Learned Video Compression can be warmed up efficiently but effectively with our pre-trained Degradation Network. Besides, our proposed TF-SCD can further enhance the restoration performance in a specific range of QPs and provide a better generalization about QPs compared with the common way of training a restoration model. Our work is available at https://minhmanho.github.io/prechewing_degradation.

查看原文本刊更多论文

学习视频压缩的预咀嚼压缩退化研究

人工智能(AI)需要大量的数据，视频压缩的学习恢复也是如此。关于训练数据有两个主要问题。1)使用视频编解码器(例如，通用视频编码- VVC)准备训练压缩退化需要相当大的资源。值得注意的是，我们压缩的量化参数(QPs)越多，所需的编码时间和存储空间就越多。(2)一般在初始阶段对刚初始化的恢复网络进行纯压缩退化训练的方法是无效的。为了解决这些问题，我们提出了一个退化网络来预咀嚼(泛化和学习合成)真实的压缩退化，然后提出了一个混合训练方案，允许恢复网络在没有压缩的无限视频上进行训练。具体来说，我们提出了一个QP-wise退化网络来学习如何实时压缩视频帧，如VVC，并可以在qp之间线性转换退化输出。真正的压缩退化是预先考虑的，因为我们的退化网络可以为新初始化的恢复网络合成更广义的退化，从而更容易学习。为了在不压缩的情况下使训练视频内容多样化，避免过拟合，我们设计了一个半压缩退化训练框架(TF-SCD)，在许多假压缩视频和真实压缩视频上训练我们的模型。因此，恢复网络可以在训练开始时迅速跳到接近最佳的最优状态，证明了我们在训练的第一步使用预咀嚼数据的有前途的方案。换句话说，新初始化的学习视频压缩可以通过预训练的退化网络有效地预热。此外，我们提出的TF-SCD可以进一步提高在特定qp范围内的恢复性能，并且与常用的恢复模型训练方法相比，可以提供更好的qp泛化。我们的工作可以在https://minhmanho.github.io/prechewing_degradation上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)

自引率

0.00%

发文量