Neural Volumetric Video Coding With Hierarchical Coded Representation of Dynamic Volume

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-02-20 DOI:10.1109/TMM.2025.3544415

Ju-Yeon Shin;Jung-Kyung Lee;Gun Bang;Jun-Sik Kim;Je-Won Kang

{"title":"Neural Volumetric Video Coding With Hierarchical Coded Representation of Dynamic Volume","authors":"Ju-Yeon Shin;Jung-Kyung Lee;Gun Bang;Jun-Sik Kim;Je-Won Kang","doi":"10.1109/TMM.2025.3544415","DOIUrl":null,"url":null,"abstract":"This article proposes a novel multi-view (MV) video coding technique that leverages a four-dimensional (4D) voxel-grid representation to enhance coding efficiency, particularly in novel view synthesis. Although the voxel grid approximation provides a continuous representation for dynamic scenes, its volumetric nature requires substantial storage. The compression of MV videos can be interpreted as the compression of dense features. However, the substantial size of these features poses a significant problem relative to the generation of dynamic scenes at arbitrary viewpoints. To address this challenge, this study introduces a hierarchical coded representation of dynamic volumes based on low-rank tensor decomposition of volumetric features and develops effective coding techniques based on this representation. The proposed method employs a two-level coding strategy to capture the temporal characteristics of the decomposed features. At a higher level, spatial features are encoded, representing 3D structural information, with time-invariant components over short intervals of an MV video sequence. At a lower level, temporal features are encoded to capture the dynamics of current scenes. The spatial features are shared in a group, and temporal features are encoded at each time step. The experimental results demonstrate that the proposed technique outperforms existing MV video coding standards and current state-of-the-art methods, providing superior rate-distortion performance in the novel view synthesis of MV video compression.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4412-4426"},"PeriodicalIF":9.7000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10897849/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This article proposes a novel multi-view (MV) video coding technique that leverages a four-dimensional (4D) voxel-grid representation to enhance coding efficiency, particularly in novel view synthesis. Although the voxel grid approximation provides a continuous representation for dynamic scenes, its volumetric nature requires substantial storage. The compression of MV videos can be interpreted as the compression of dense features. However, the substantial size of these features poses a significant problem relative to the generation of dynamic scenes at arbitrary viewpoints. To address this challenge, this study introduces a hierarchical coded representation of dynamic volumes based on low-rank tensor decomposition of volumetric features and develops effective coding techniques based on this representation. The proposed method employs a two-level coding strategy to capture the temporal characteristics of the decomposed features. At a higher level, spatial features are encoded, representing 3D structural information, with time-invariant components over short intervals of an MV video sequence. At a lower level, temporal features are encoded to capture the dynamics of current scenes. The spatial features are shared in a group, and temporal features are encoded at each time step. The experimental results demonstrate that the proposed technique outperforms existing MV video coding standards and current state-of-the-art methods, providing superior rate-distortion performance in the novel view synthesis of MV video compression.

查看原文本刊更多论文

基于动态体积分层编码表示的神经体积视频编码

本文提出了一种新的多视点（MV）视频编码技术，该技术利用四维（4D）体素网格表示来提高编码效率，特别是在新视点合成方面。虽然体素网格近似为动态场景提供了连续表示，但其体积性质需要大量存储。MV视频的压缩可以理解为密集特征的压缩。然而，相对于在任意视点生成动态场景而言，这些特征的巨大尺寸带来了一个重大问题。为了解决这一挑战，本研究引入了基于体积特征的低秩张量分解的动态体积的分层编码表示，并在此基础上开发了有效的编码技术。该方法采用两级编码策略来捕获分解特征的时间特征。在更高的层次上，空间特征被编码，表示三维结构信息，在MV视频序列的短间隔内具有时不变分量。在较低的层次上，时间特征被编码以捕捉当前场景的动态。空间特征在一组中共享，时间特征在每个时间步进行编码。实验结果表明，该技术优于现有的MV视频编码标准和当前最先进的方法，在MV视频压缩的新型视图合成中提供了优越的率失真性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.