Self-Supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-30 DOI:10.1109/TPAMI.2025.3584625

Yifan Zhang;Junhui Hou;Siyu Ren;Jinjian Wu;Yixuan Yuan;Guangming Shi

{"title":"Self-Supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration","authors":"Yifan Zhang;Junhui Hou;Siyu Ren;Jinjian Wu;Yixuan Yuan;Guangming Shi","doi":"10.1109/TPAMI.2025.3584625","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. Specifically, our approach, namely NCLR, focuses on 2D-3D neural calibration, a novel pretext task that estimates the rigid pose aligning camera and LiDAR coordinate systems. First, we propose the learnable transformation alignment to bridge the domain gap between image and point cloud data, converting features into a unified representation space for effective comparison and matching. Second, we identify the overlapping area between the image and point cloud with the fused features. Third, we establish dense 2D-3D correspondences to estimate the rigid pose. The framework not only learns fine-grained matching from points to pixels but also achieves alignment of the image and point cloud at a holistic level, understanding the LiDAR-to-camera extrinsic parameters. We demonstrate the efficacy of NCLR by applying the pre-trained backbone to downstream tasks, such as LiDAR-based 3D semantic segmentation, object detection, and panoptic segmentation. Comprehensive experiments on various datasets illustrate the superiority of NCLR over existing self-supervised methods. The results confirm that joint learning from different modalities significantly enhances the network’s understanding abilities and effectiveness of learned representation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9201-9216"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11059839/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. Specifically, our approach, namely NCLR, focuses on 2D-3D neural calibration, a novel pretext task that estimates the rigid pose aligning camera and LiDAR coordinate systems. First, we propose the learnable transformation alignment to bridge the domain gap between image and point cloud data, converting features into a unified representation space for effective comparison and matching. Second, we identify the overlapping area between the image and point cloud with the fused features. Third, we establish dense 2D-3D correspondences to estimate the rigid pose. The framework not only learns fine-grained matching from points to pixels but also achieves alignment of the image and point cloud at a holistic level, understanding the LiDAR-to-camera extrinsic parameters. We demonstrate the efficacy of NCLR by applying the pre-trained backbone to downstream tasks, such as LiDAR-based 3D semantic segmentation, object detection, and panoptic segmentation. Comprehensive experiments on various datasets illustrate the superiority of NCLR over existing self-supervised methods. The results confirm that joint learning from different modalities significantly enhances the network’s understanding abilities and effectiveness of learned representation.

查看原文本刊更多论文

基于2D-3D神经校准的LiDAR 3D点云自监督学习

本文介绍了一种新的自监督学习框架，用于增强自动驾驶场景中的3D感知。具体来说，我们的方法，即NCLR，专注于2D-3D神经校准，这是一种新的伪装任务，用于估计相机和LiDAR坐标系的刚性姿势对齐。首先，我们提出了可学习的变换对齐，以弥合图像和点云数据之间的域差距，将特征转换到统一的表示空间中进行有效的比较和匹配。其次，利用融合特征识别图像与点云之间的重叠区域；第三，建立密集的2D-3D对应关系来估计刚体位姿。该框架不仅可以学习从点到像素的细粒度匹配，还可以在整体层面上实现图像和点云的对齐，并了解激光雷达到相机的外部参数。我们通过将预训练的主干应用于下游任务，如基于lidar的3D语义分割、目标检测和全光分割，证明了NCLR的有效性。在各种数据集上的综合实验表明，NCLR优于现有的自监督方法。结果证实，不同模式的联合学习显著提高了网络的理解能力和学习表征的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量