Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images

IF 2.1 4区地球科学 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY

PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science Pub Date : 2024-09-18 DOI:10.1007/s41064-024-00308-9

S. Abualhanud, E. Erahan, M. Mehltretter

{"title":"Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images","authors":"S. Abualhanud, E. Erahan, M. Mehltretter","doi":"10.1007/s41064-024-00308-9","DOIUrl":null,"url":null,"abstract":"<p>An accurate 3D representation of the geometry and semantics of an environment builds the basis for a large variety of downstream tasks and is essential for autonomous driving related tasks such as path planning and obstacle avoidance. The focus of this work is put on 3D semantic occupancy prediction, i.e., the reconstruction of a scene as a voxel grid where each voxel is assigned both an occupancy and a semantic label. We present a Convolutional Neural Network-based method that utilizes multiple color images from a surround-view setup with minimal overlap, together with the associated interior and exterior camera parameters as input, to reconstruct the observed environment as a 3D semantic occupancy map. To account for the ill-posed nature of reconstructing a 3D representation from monocular 2D images, the image information is integrated over time: Under the assumption that the camera setup is moving, images from consecutive time steps are used to form a multi-view stereo setup. In exhaustive experiments, we investigate the challenges presented by dynamic objects and the possibilities of training the proposed method with either 3D or 2D reference data. Latter being motivated by the comparably higher costs of generating and annotating 3D ground truth data. Moreover, we present and investigate a novel self-supervised training scheme that does not require any geometric reference data, but only relies on sparse semantic ground truth. An evaluation on the Occ3D dataset, including a comparison against current state-of-the-art self-supervised methods from the literature, demonstrates the potential of our self-supervised variant.</p>","PeriodicalId":56035,"journal":{"name":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","volume":"6 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s41064-024-00308-9","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

An accurate 3D representation of the geometry and semantics of an environment builds the basis for a large variety of downstream tasks and is essential for autonomous driving related tasks such as path planning and obstacle avoidance. The focus of this work is put on 3D semantic occupancy prediction, i.e., the reconstruction of a scene as a voxel grid where each voxel is assigned both an occupancy and a semantic label. We present a Convolutional Neural Network-based method that utilizes multiple color images from a surround-view setup with minimal overlap, together with the associated interior and exterior camera parameters as input, to reconstruct the observed environment as a 3D semantic occupancy map. To account for the ill-posed nature of reconstructing a 3D representation from monocular 2D images, the image information is integrated over time: Under the assumption that the camera setup is moving, images from consecutive time steps are used to form a multi-view stereo setup. In exhaustive experiments, we investigate the challenges presented by dynamic objects and the possibilities of training the proposed method with either 3D or 2D reference data. Latter being motivated by the comparably higher costs of generating and annotating 3D ground truth data. Moreover, we present and investigate a novel self-supervised training scheme that does not require any geometric reference data, but only relies on sparse semantic ground truth. An evaluation on the Occ3D dataset, including a comparison against current state-of-the-art self-supervised methods from the literature, demonstrates the potential of our self-supervised variant.

Abstract Image

查看原文本刊更多论文

根据多视角二维环绕图像进行自监督三维语义占用预测

对环境的几何形状和语义进行精确的三维表示，为各种下游任务奠定了基础，对于自动驾驶相关任务（如路径规划和避障）也至关重要。这项工作的重点是三维语义占位预测，即以体素网格的形式重建场景，其中每个体素都被分配了占位和语义标签。我们提出了一种基于卷积神经网络的方法，该方法利用环视设置中重叠最少的多幅彩色图像以及相关的内部和外部相机参数作为输入，将观察到的环境重建为三维语义占位图。考虑到从单目二维图像重建三维表示的不确定性，图像信息是随时间整合的：假设摄像机是移动的，那么连续时间步骤的图像将被用于形成多视角立体设置。在详尽的实验中，我们研究了动态物体带来的挑战，以及使用三维或二维参考数据训练所提方法的可能性。后者是因为生成和注释三维地面实况数据的成本相对较高。此外，我们还提出并研究了一种新颖的自监督训练方案，它不需要任何几何参考数据，而只依赖于稀疏的语义地面实况。在 Occ3D 数据集上进行的评估，包括与目前文献中最先进的自监督方法的比较，证明了我们的自监督变体的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science Physics and Astronomy-Instrumentation

CiteScore

8.20

自引率

2.40%

发文量

期刊介绍： PFG is an international scholarly journal covering the progress and application of photogrammetric methods, remote sensing technology and the interconnected field of geoinformation science. It places special editorial emphasis on the communication of new methodologies in data acquisition and new approaches to optimized processing and interpretation of all types of data which were acquired by photogrammetric methods, remote sensing, image processing and the computer-aided interpretation of such data in general. The journal hence addresses both researchers and students of these disciplines at academic institutions and universities as well as the downstream users in both the private sector and public administration. Founded in 1926 under the former name Bildmessung und Luftbildwesen, PFG is worldwide the oldest journal on photogrammetry. It is the official journal of the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).