Self-Supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration

IF 18.6
Yifan Zhang;Junhui Hou;Siyu Ren;Jinjian Wu;Yixuan Yuan;Guangming Shi
{"title":"Self-Supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration","authors":"Yifan Zhang;Junhui Hou;Siyu Ren;Jinjian Wu;Yixuan Yuan;Guangming Shi","doi":"10.1109/TPAMI.2025.3584625","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. Specifically, our approach, namely NCLR, focuses on 2D-3D neural calibration, a novel pretext task that estimates the rigid pose aligning camera and LiDAR coordinate systems. First, we propose the learnable transformation alignment to bridge the domain gap between image and point cloud data, converting features into a unified representation space for effective comparison and matching. Second, we identify the overlapping area between the image and point cloud with the fused features. Third, we establish dense 2D-3D correspondences to estimate the rigid pose. The framework not only learns fine-grained matching from points to pixels but also achieves alignment of the image and point cloud at a holistic level, understanding the LiDAR-to-camera extrinsic parameters. We demonstrate the efficacy of NCLR by applying the pre-trained backbone to downstream tasks, such as LiDAR-based 3D semantic segmentation, object detection, and panoptic segmentation. Comprehensive experiments on various datasets illustrate the superiority of NCLR over existing self-supervised methods. The results confirm that joint learning from different modalities significantly enhances the network’s understanding abilities and effectiveness of learned representation.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9201-9216"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11059839/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes. Specifically, our approach, namely NCLR, focuses on 2D-3D neural calibration, a novel pretext task that estimates the rigid pose aligning camera and LiDAR coordinate systems. First, we propose the learnable transformation alignment to bridge the domain gap between image and point cloud data, converting features into a unified representation space for effective comparison and matching. Second, we identify the overlapping area between the image and point cloud with the fused features. Third, we establish dense 2D-3D correspondences to estimate the rigid pose. The framework not only learns fine-grained matching from points to pixels but also achieves alignment of the image and point cloud at a holistic level, understanding the LiDAR-to-camera extrinsic parameters. We demonstrate the efficacy of NCLR by applying the pre-trained backbone to downstream tasks, such as LiDAR-based 3D semantic segmentation, object detection, and panoptic segmentation. Comprehensive experiments on various datasets illustrate the superiority of NCLR over existing self-supervised methods. The results confirm that joint learning from different modalities significantly enhances the network’s understanding abilities and effectiveness of learned representation.
基于2D-3D神经校准的LiDAR 3D点云自监督学习
本文介绍了一种新的自监督学习框架,用于增强自动驾驶场景中的3D感知。具体来说,我们的方法,即NCLR,专注于2D-3D神经校准,这是一种新的伪装任务,用于估计相机和LiDAR坐标系的刚性姿势对齐。首先,我们提出了可学习的变换对齐,以弥合图像和点云数据之间的域差距,将特征转换到统一的表示空间中进行有效的比较和匹配。其次,利用融合特征识别图像与点云之间的重叠区域;第三,建立密集的2D-3D对应关系来估计刚体位姿。该框架不仅可以学习从点到像素的细粒度匹配,还可以在整体层面上实现图像和点云的对齐,并了解激光雷达到相机的外部参数。我们通过将预训练的主干应用于下游任务,如基于lidar的3D语义分割、目标检测和全光分割,证明了NCLR的有效性。在各种数据集上的综合实验表明,NCLR优于现有的自监督方法。结果证实,不同模式的联合学习显著提高了网络的理解能力和学习表征的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信