NVRC: Neural Video Representation Compression

arXiv - EE - Image and Video Processing Pub Date : 2024-09-11 DOI:arxiv-2409.07414

Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

{"title":"NVRC: Neural Video Representation Compression","authors":"Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull","doi":"arxiv-2409.07414","DOIUrl":null,"url":null,"abstract":"Recent advances in implicit neural representation (INR)-based video coding\nhave demonstrated its potential to compete with both conventional and other\nlearning-based approaches. With INR methods, a neural network is trained to\noverfit a video sequence, with its parameters compressed to obtain a compact\nrepresentation of the video content. However, although promising results have\nbeen achieved, the best INR-based methods are still out-performed by the latest\nstandard codecs, such as VVC VTM, partially due to the simple model compression\ntechniques employed. In this paper, rather than focusing on representation\narchitectures as in many existing works, we propose a novel INR-based video\ncompression framework, Neural Video Representation Compression (NVRC),\ntargeting compression of the representation. Based on the novel entropy coding\nand quantization models proposed, NVRC, for the first time, is able to optimize\nan INR-based video codec in a fully end-to-end manner. To further minimize the\nadditional bitrate overhead introduced by the entropy models, we have also\nproposed a new model compression framework for coding all the network,\nquantization and entropy model parameters hierarchically. Our experiments show\nthat NVRC outperforms many conventional and learning-based benchmark codecs,\nwith a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset,\nmeasured in PSNR. As far as we are aware, this is the first time an INR-based\nvideo codec achieving such performance. The implementation of NVRC will be\nreleased at www.github.com.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in implicit neural representation (INR)-based video coding have demonstrated its potential to compete with both conventional and other learning-based approaches. With INR methods, a neural network is trained to overfit a video sequence, with its parameters compressed to obtain a compact representation of the video content. However, although promising results have been achieved, the best INR-based methods are still out-performed by the latest standard codecs, such as VVC VTM, partially due to the simple model compression techniques employed. In this paper, rather than focusing on representation architectures as in many existing works, we propose a novel INR-based video compression framework, Neural Video Representation Compression (NVRC), targeting compression of the representation. Based on the novel entropy coding and quantization models proposed, NVRC, for the first time, is able to optimize an INR-based video codec in a fully end-to-end manner. To further minimize the additional bitrate overhead introduced by the entropy models, we have also proposed a new model compression framework for coding all the network, quantization and entropy model parameters hierarchically. Our experiments show that NVRC outperforms many conventional and learning-based benchmark codecs, with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset, measured in PSNR. As far as we are aware, this is the first time an INR-based video codec achieving such performance. The implementation of NVRC will be released at www.github.com.

查看原文本刊更多论文

NVRC：神经视频表示压缩

基于隐式神经表示（INR）的视频编码技术的最新进展表明，它具有与传统方法和其他基于学习的方法相抗衡的潜力。通过 INR 方法，训练神经网络以适应视频序列，并压缩其参数以获得视频内容的紧凑表示。然而，尽管已经取得了可喜的成果，但基于 INR 的最佳方法仍然比不上最新的标准编解码器，如 VVC VTM，部分原因是采用了简单的模型压缩技术。在本文中，我们没有像许多现有著作那样专注于表示架构，而是提出了一种新颖的基于 INR 的视频压缩框架--神经视频表示压缩（NVRC），其目标是压缩表示。基于所提出的新型熵编码和量化模型，NVRC 首次能够以完全端到端的方式优化基于 INR 的视频编解码器。为了进一步减少熵模型带来的额外比特率开销，我们还提出了一种新的模型压缩框架，对所有网络、量化和熵模型参数进行分层编码。我们的实验表明，在 UVG 数据集上，NVRC 的性能优于许多传统的和基于学习的基准编解码器，与 VVC VTM（随机存取）相比，NVRC 的平均编码增益为 24%（以 PSNR 衡量）。据我们所知，这是基于 INR 的视频编解码器首次达到这样的性能。NVRC 的实现将在 www.github.com 上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - EE - Image and Video Processing

自引率

0.00%

发文量