{"title":"SPOT: Scalable 3D Pre-training via Occupancy Prediction for Learning Transferable 3D Representations.","authors":"Xiangchao Yan,Runjian Chen,Bo Zhang,Hancheng Ye,Renqiu Xia,Jiakang Yuan,Hongbin Zhou,Xinyu Cai,Botian Shi,Wenqi Shao,Ping Luo,Yu Qiao,Tao Chen,Junchi Yan","doi":"10.1109/tpami.2025.3586961","DOIUrl":null,"url":null,"abstract":"Annotating 3D LiDAR point clouds for perception tasks is fundamental for many applications e.g. autonomous driving, yet it still remains notoriously labor-intensive. Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks. In this paper, we propose SPOT, namely Scalable Pre-training via Occupancy prediction for learning Transferable 3D representations under such a label-efficient fine-tuning paradigm. SPOT achieves effectiveness on various public datasets with different downstream tasks, showcasing its general representation power, cross-domain robustness and data scalability which are three key factors for real-world application. Specifically, we both theoretically and empirically show, for the first time, that general representations learning can be achieved through the task of occupancy prediction. Then, to address the domain gap caused by different LiDAR sensors and annotation methods, we develop a beam re-sampling technique for point cloud augmentation combined with class-balancing strategy. Furthermore, scalable pre-training is observed, that is, the downstream performance across all the experiments gets better with more pre-training data. Additionally, such pre-training strategy also remains compatible with unlabeled data. The hope is that our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"21 1","pages":""},"PeriodicalIF":20.8000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3586961","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Annotating 3D LiDAR point clouds for perception tasks is fundamental for many applications e.g. autonomous driving, yet it still remains notoriously labor-intensive. Pretraining-finetuning approach can alleviate the labeling burden by fine-tuning a pre-trained backbone across various downstream datasets as well as tasks. In this paper, we propose SPOT, namely Scalable Pre-training via Occupancy prediction for learning Transferable 3D representations under such a label-efficient fine-tuning paradigm. SPOT achieves effectiveness on various public datasets with different downstream tasks, showcasing its general representation power, cross-domain robustness and data scalability which are three key factors for real-world application. Specifically, we both theoretically and empirically show, for the first time, that general representations learning can be achieved through the task of occupancy prediction. Then, to address the domain gap caused by different LiDAR sensors and annotation methods, we develop a beam re-sampling technique for point cloud augmentation combined with class-balancing strategy. Furthermore, scalable pre-training is observed, that is, the downstream performance across all the experiments gets better with more pre-training data. Additionally, such pre-training strategy also remains compatible with unlabeled data. The hope is that our findings will facilitate the understanding of LiDAR points and pave the way for future advancements in LiDAR pre-training.
期刊介绍:
The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.