Ju-Yeon Shin;Jung-Kyung Lee;Gun Bang;Jun-Sik Kim;Je-Won Kang
{"title":"基于动态体积分层编码表示的神经体积视频编码","authors":"Ju-Yeon Shin;Jung-Kyung Lee;Gun Bang;Jun-Sik Kim;Je-Won Kang","doi":"10.1109/TMM.2025.3544415","DOIUrl":null,"url":null,"abstract":"This article proposes a novel multi-view (MV) video coding technique that leverages a four-dimensional (4D) voxel-grid representation to enhance coding efficiency, particularly in novel view synthesis. Although the voxel grid approximation provides a continuous representation for dynamic scenes, its volumetric nature requires substantial storage. The compression of MV videos can be interpreted as the compression of dense features. However, the substantial size of these features poses a significant problem relative to the generation of dynamic scenes at arbitrary viewpoints. To address this challenge, this study introduces a hierarchical coded representation of dynamic volumes based on low-rank tensor decomposition of volumetric features and develops effective coding techniques based on this representation. The proposed method employs a two-level coding strategy to capture the temporal characteristics of the decomposed features. At a higher level, spatial features are encoded, representing 3D structural information, with time-invariant components over short intervals of an MV video sequence. At a lower level, temporal features are encoded to capture the dynamics of current scenes. The spatial features are shared in a group, and temporal features are encoded at each time step. The experimental results demonstrate that the proposed technique outperforms existing MV video coding standards and current state-of-the-art methods, providing superior rate-distortion performance in the novel view synthesis of MV video compression.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4412-4426"},"PeriodicalIF":9.7000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Neural Volumetric Video Coding With Hierarchical Coded Representation of Dynamic Volume\",\"authors\":\"Ju-Yeon Shin;Jung-Kyung Lee;Gun Bang;Jun-Sik Kim;Je-Won Kang\",\"doi\":\"10.1109/TMM.2025.3544415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article proposes a novel multi-view (MV) video coding technique that leverages a four-dimensional (4D) voxel-grid representation to enhance coding efficiency, particularly in novel view synthesis. Although the voxel grid approximation provides a continuous representation for dynamic scenes, its volumetric nature requires substantial storage. The compression of MV videos can be interpreted as the compression of dense features. However, the substantial size of these features poses a significant problem relative to the generation of dynamic scenes at arbitrary viewpoints. To address this challenge, this study introduces a hierarchical coded representation of dynamic volumes based on low-rank tensor decomposition of volumetric features and develops effective coding techniques based on this representation. The proposed method employs a two-level coding strategy to capture the temporal characteristics of the decomposed features. At a higher level, spatial features are encoded, representing 3D structural information, with time-invariant components over short intervals of an MV video sequence. At a lower level, temporal features are encoded to capture the dynamics of current scenes. The spatial features are shared in a group, and temporal features are encoded at each time step. The experimental results demonstrate that the proposed technique outperforms existing MV video coding standards and current state-of-the-art methods, providing superior rate-distortion performance in the novel view synthesis of MV video compression.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"4412-4426\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10897849/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10897849/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Neural Volumetric Video Coding With Hierarchical Coded Representation of Dynamic Volume
This article proposes a novel multi-view (MV) video coding technique that leverages a four-dimensional (4D) voxel-grid representation to enhance coding efficiency, particularly in novel view synthesis. Although the voxel grid approximation provides a continuous representation for dynamic scenes, its volumetric nature requires substantial storage. The compression of MV videos can be interpreted as the compression of dense features. However, the substantial size of these features poses a significant problem relative to the generation of dynamic scenes at arbitrary viewpoints. To address this challenge, this study introduces a hierarchical coded representation of dynamic volumes based on low-rank tensor decomposition of volumetric features and develops effective coding techniques based on this representation. The proposed method employs a two-level coding strategy to capture the temporal characteristics of the decomposed features. At a higher level, spatial features are encoded, representing 3D structural information, with time-invariant components over short intervals of an MV video sequence. At a lower level, temporal features are encoded to capture the dynamics of current scenes. The spatial features are shared in a group, and temporal features are encoded at each time step. The experimental results demonstrate that the proposed technique outperforms existing MV video coding standards and current state-of-the-art methods, providing superior rate-distortion performance in the novel view synthesis of MV video compression.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.