Leveraging occupancy map to accelerate video-based point cloud compression

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2024-09-16 DOI:10.1016/j.jvcir.2024.104292

Wenyu Wang, Gongchun Ding, Dandan Ding

{"title":"Leveraging occupancy map to accelerate video-based point cloud compression","authors":"Wenyu Wang, Gongchun Ding, Dandan Ding","doi":"10.1016/j.jvcir.2024.104292","DOIUrl":null,"url":null,"abstract":"<div><p>Video-based Point Cloud Compression enables point cloud streaming over the internet by converting dynamic 3D point clouds to 2D geometry and attribute videos, which are then compressed using 2D video codecs like H.266/VVC. However, the complex encoding process of H.266/VVC, such as the quadtree with nested multi-type tree (QTMT) partition, greatly hinders the practical application of V-PCC. To address this issue, we propose a fast CU partition method dedicated to V-PCC to accelerate the coding process. Specifically, we classify coding units (CUs) of projected images into three categories based on the occupancy map of a point cloud: unoccupied, partially occupied, and fully occupied. Subsequently, we employ either statistic-based rules or machine-learning models to manage the partition of each category. For unoccupied CUs, we terminate the partition directly; for partially occupied CUs with explicit directions, we selectively skip certain partition candidates; for the remaining CUs (partially occupied CUs with complex directions and fully occupied CUs), we train an edge-driven LightGBM model to predict the partition probability of each partition candidate automatically. Only partitions with high probabilities are retained for further Rate–Distortion (R–D) decisions. Comprehensive experiments demonstrate the superior performance of our proposed method: under the V-PCC common test conditions, our method reduces encoding time by 52% and 44% in geometry and attribute, respectively, while incurring only 0.68% (0.66%) BD-Rate loss in D1 (D2) measurements and 0.79% (luma) BD-Rate loss in attribute, significantly surpassing state-of-the-art works.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"104 ","pages":"Article 104292"},"PeriodicalIF":2.6000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324002487","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Video-based Point Cloud Compression enables point cloud streaming over the internet by converting dynamic 3D point clouds to 2D geometry and attribute videos, which are then compressed using 2D video codecs like H.266/VVC. However, the complex encoding process of H.266/VVC, such as the quadtree with nested multi-type tree (QTMT) partition, greatly hinders the practical application of V-PCC. To address this issue, we propose a fast CU partition method dedicated to V-PCC to accelerate the coding process. Specifically, we classify coding units (CUs) of projected images into three categories based on the occupancy map of a point cloud: unoccupied, partially occupied, and fully occupied. Subsequently, we employ either statistic-based rules or machine-learning models to manage the partition of each category. For unoccupied CUs, we terminate the partition directly; for partially occupied CUs with explicit directions, we selectively skip certain partition candidates; for the remaining CUs (partially occupied CUs with complex directions and fully occupied CUs), we train an edge-driven LightGBM model to predict the partition probability of each partition candidate automatically. Only partitions with high probabilities are retained for further Rate–Distortion (R–D) decisions. Comprehensive experiments demonstrate the superior performance of our proposed method: under the V-PCC common test conditions, our method reduces encoding time by 52% and 44% in geometry and attribute, respectively, while incurring only 0.68% (0.66%) BD-Rate loss in D1 (D2) measurements and 0.79% (luma) BD-Rate loss in attribute, significantly surpassing state-of-the-art works.

查看原文本刊更多论文

利用占用图加速基于视频的点云压缩

基于视频的点云压缩技术通过将动态三维点云转换为二维几何图形和属性视频，然后使用 H.266/VVC 等二维视频编解码器对其进行压缩，从而在互联网上实现点云流媒体传输。然而，H.266/VVC 复杂的编码过程，如四叉树嵌套多类型树（QTMT）分区，极大地阻碍了 V-PCC 的实际应用。为解决这一问题，我们提出了一种专用于 V-PCC 的快速 CU 分区方法，以加快编码过程。具体来说，我们根据点云的占用图将投影图像的编码单元（CU）分为三类：未占用、部分占用和完全占用。随后，我们采用基于统计的规则或机器学习模型来管理每个类别的分区。对于未被占用的 CU，我们直接终止分区；对于有明确方向的部分被占用的 CU，我们选择性地跳过某些候选分区；对于其余的 CU（有复杂方向的部分被占用的 CU 和完全被占用的 CU），我们训练一个边缘驱动的 LightGBM 模型来自动预测每个候选分区的分区概率。只有高概率的分区才会被保留下来，以便进一步做出速率失真（R-D）决策。综合实验证明了我们所提方法的优越性能：在 V-PCC 通用测试条件下，我们的方法在几何和属性方面分别缩短了 52% 和 44% 的编码时间，而在 D1 (D2) 测量中仅造成 0.68% (0.66%) 的 BD-Rate 损失，在属性方面造成 0.79% (luma) 的 BD-Rate 损失，大大超过了最先进的作品。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.