Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull
{"title":"NVRC: Neural Video Representation Compression","authors":"Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull","doi":"arxiv-2409.07414","DOIUrl":null,"url":null,"abstract":"Recent advances in implicit neural representation (INR)-based video coding\nhave demonstrated its potential to compete with both conventional and other\nlearning-based approaches. With INR methods, a neural network is trained to\noverfit a video sequence, with its parameters compressed to obtain a compact\nrepresentation of the video content. However, although promising results have\nbeen achieved, the best INR-based methods are still out-performed by the latest\nstandard codecs, such as VVC VTM, partially due to the simple model compression\ntechniques employed. In this paper, rather than focusing on representation\narchitectures as in many existing works, we propose a novel INR-based video\ncompression framework, Neural Video Representation Compression (NVRC),\ntargeting compression of the representation. Based on the novel entropy coding\nand quantization models proposed, NVRC, for the first time, is able to optimize\nan INR-based video codec in a fully end-to-end manner. To further minimize the\nadditional bitrate overhead introduced by the entropy models, we have also\nproposed a new model compression framework for coding all the network,\nquantization and entropy model parameters hierarchically. Our experiments show\nthat NVRC outperforms many conventional and learning-based benchmark codecs,\nwith a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset,\nmeasured in PSNR. As far as we are aware, this is the first time an INR-based\nvideo codec achieving such performance. The implementation of NVRC will be\nreleased at www.github.com.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Image and Video Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in implicit neural representation (INR)-based video coding
have demonstrated its potential to compete with both conventional and other
learning-based approaches. With INR methods, a neural network is trained to
overfit a video sequence, with its parameters compressed to obtain a compact
representation of the video content. However, although promising results have
been achieved, the best INR-based methods are still out-performed by the latest
standard codecs, such as VVC VTM, partially due to the simple model compression
techniques employed. In this paper, rather than focusing on representation
architectures as in many existing works, we propose a novel INR-based video
compression framework, Neural Video Representation Compression (NVRC),
targeting compression of the representation. Based on the novel entropy coding
and quantization models proposed, NVRC, for the first time, is able to optimize
an INR-based video codec in a fully end-to-end manner. To further minimize the
additional bitrate overhead introduced by the entropy models, we have also
proposed a new model compression framework for coding all the network,
quantization and entropy model parameters hierarchically. Our experiments show
that NVRC outperforms many conventional and learning-based benchmark codecs,
with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset,
measured in PSNR. As far as we are aware, this is the first time an INR-based
video codec achieving such performance. The implementation of NVRC will be
released at www.github.com.