LiDAR-based place recognition for mobile robots in ground/water surface multiple scenes

IF 4.2 2区计算机科学 Q2 ROBOTICS

Journal of Field Robotics Pub Date : 2024-08-31 DOI:10.1002/rob.22423

Yaxuan Yan, Haiyang Zhang, Changming Zhao, Xuan Liu, Siyuan Fu

{"title":"LiDAR-based place recognition for mobile robots in ground/water surface multiple scenes","authors":"Yaxuan Yan, Haiyang Zhang, Changming Zhao, Xuan Liu, Siyuan Fu","doi":"10.1002/rob.22423","DOIUrl":null,"url":null,"abstract":"<p>LiDAR-based 3D place recognition is an essential component of simultaneous localization and mapping systems in multi-scene robotic applications. However, extracting discriminative and generalizable global descriptors of point clouds is still an open issue due to the insufficient use of the information contained in the LiDAR scans in existing approaches. In this paper, we propose a novel spatial-temporal point cloud encoding network for multiple scenes, dubbed STM-Net, to fully fuse the multi-view spatial information and temporal information of LiDAR point clouds. Specifically, we first develop a spatial feature encoding module consisting of the single-view transformer and multi-view transformer. The module learns the correlation both within a single view and between two views by utilizing the multi-layer range images generated by spherical projection and multi-layer bird's eye view images generated by top-down projection. Then in the temporal feature encoding module, we exploit the temporal transformer to mine the temporal information in the sequential point clouds, and a NetVLAD layer is applied to aggregate features and generate sub-descriptors. Furthermore, we use a GeM pooling layer to fuse more information along the time dimension for the final global descriptors. Extensive experiments conducted on unmanned ground/surface vehicles with different LiDAR configurations indicate that our method (1) achieves superior place recognition performance than state-of-the-art algorithms, (2) generalizes well to diverse sceneries, (3) is robust to viewpoint changes, (4) can operate in real-time, demonstrating the effectiveness and satisfactory capability of the proposed approach and highlighting its promising applications in multi-scene place recognition tasks.</p>","PeriodicalId":192,"journal":{"name":"Journal of Field Robotics","volume":"42 2","pages":"539-558"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Field Robotics","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rob.22423","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

LiDAR-based 3D place recognition is an essential component of simultaneous localization and mapping systems in multi-scene robotic applications. However, extracting discriminative and generalizable global descriptors of point clouds is still an open issue due to the insufficient use of the information contained in the LiDAR scans in existing approaches. In this paper, we propose a novel spatial-temporal point cloud encoding network for multiple scenes, dubbed STM-Net, to fully fuse the multi-view spatial information and temporal information of LiDAR point clouds. Specifically, we first develop a spatial feature encoding module consisting of the single-view transformer and multi-view transformer. The module learns the correlation both within a single view and between two views by utilizing the multi-layer range images generated by spherical projection and multi-layer bird's eye view images generated by top-down projection. Then in the temporal feature encoding module, we exploit the temporal transformer to mine the temporal information in the sequential point clouds, and a NetVLAD layer is applied to aggregate features and generate sub-descriptors. Furthermore, we use a GeM pooling layer to fuse more information along the time dimension for the final global descriptors. Extensive experiments conducted on unmanned ground/surface vehicles with different LiDAR configurations indicate that our method (1) achieves superior place recognition performance than state-of-the-art algorithms, (2) generalizes well to diverse sceneries, (3) is robust to viewpoint changes, (4) can operate in real-time, demonstrating the effectiveness and satisfactory capability of the proposed approach and highlighting its promising applications in multi-scene place recognition tasks.

查看原文本刊更多论文

基于激光雷达的移动机器人在地面/水面多重场景中的位置识别

基于激光雷达的三维地点识别是多场景机器人应用中同步定位和绘图系统的重要组成部分。然而，由于现有方法没有充分利用激光雷达扫描中包含的信息，因此提取具有区分性和通用性的点云全局描述符仍是一个未决问题。在本文中，我们提出了一种新颖的多场景时空点云编码网络（STM-Net），以充分融合激光雷达点云的多视角空间信息和时间信息。具体来说，我们首先开发了由单视角变换器和多视角变换器组成的空间特征编码模块。该模块利用球面投影生成的多层测距图像和自上而下投影生成的多层鸟瞰图像，学习单视图内和双视图之间的相关性。然后，在时序特征编码模块中，我们利用时序变换器挖掘连续点云中的时序信息，并应用 NetVLAD 层聚合特征并生成子描述符。此外，我们还使用 GeM 池化层沿时间维度融合更多信息，以生成最终的全局描述符。在采用不同激光雷达配置的无人地面/地面车辆上进行的大量实验表明，我们的方法（1）实现了比最先进算法更优越的地点识别性能，（2）对不同场景具有良好的泛化能力，（3）对视角变化具有鲁棒性，（4）可实时运行，这证明了所提方法的有效性和令人满意的能力，并突出了其在多场景地点识别任务中的应用前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Field Robotics 工程技术-机器人学

CiteScore

15.00

自引率

3.60%

发文量

审稿时长

6 months

期刊介绍： The Journal of Field Robotics seeks to promote scholarly publications dealing with the fundamentals of robotics in unstructured and dynamic environments. The Journal focuses on experimental robotics and encourages publication of work that has both theoretical and practical significance.