Sha Lu;Xuecheng Xu;Dongkun Zhang;Yuxuan Wu;Haojian Lu;Xieyuanli Chen;Rong Xiong;Yue Wang
{"title":"环#:PR-by-PE全局定位与旋转翻译等变图学习","authors":"Sha Lu;Xuecheng Xu;Dongkun Zhang;Yuxuan Wu;Haojian Lu;Xieyuanli Chen;Rong Xiong;Yue Wang","doi":"10.1109/TRO.2025.3543267","DOIUrl":null,"url":null,"abstract":"Global localization using onboard perception sensors, such as cameras and light detection and ranging (LiDAR) sensors, is crucial in autonomous driving and robotics applications when Global Positioning System (GPS) signals are unreliable. Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE). Some methods train separate models for each task, while others employ a single model with dual heads, trained jointly with separate task-specific losses. However, the accuracy of localization heavily depends on the success of PR, which often fails in scenarios with significant changes in viewpoint or environmental appearance. Consequently, this renders the final PE of localization ineffective. To address this, we introduce a new paradigm, <italic>PR-by-PE localization</i>, which bypasses the need for separate PR by directly deriving it from PE. We propose RING#, an end-to-end <italic>PR-by-PE localization</i> network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors. RING# incorporates a novel design that learns two equivariant representations from BEV features, enabling globally convergent and computationally efficient PE. Comprehensive experiments on the north campus long-term vision and LiDAR (NCLT) and Oxford datasets show that RING# outperforms state-of-the-art methods in both vision and LiDAR modalities, validating the effectiveness of the proposed approach.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"1861-1881"},"PeriodicalIF":9.4000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RING#: PR-By-PE Global Localization With Roto-Translation Equivariant Gram Learning\",\"authors\":\"Sha Lu;Xuecheng Xu;Dongkun Zhang;Yuxuan Wu;Haojian Lu;Xieyuanli Chen;Rong Xiong;Yue Wang\",\"doi\":\"10.1109/TRO.2025.3543267\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Global localization using onboard perception sensors, such as cameras and light detection and ranging (LiDAR) sensors, is crucial in autonomous driving and robotics applications when Global Positioning System (GPS) signals are unreliable. Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE). Some methods train separate models for each task, while others employ a single model with dual heads, trained jointly with separate task-specific losses. However, the accuracy of localization heavily depends on the success of PR, which often fails in scenarios with significant changes in viewpoint or environmental appearance. Consequently, this renders the final PE of localization ineffective. To address this, we introduce a new paradigm, <italic>PR-by-PE localization</i>, which bypasses the need for separate PR by directly deriving it from PE. We propose RING#, an end-to-end <italic>PR-by-PE localization</i> network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors. RING# incorporates a novel design that learns two equivariant representations from BEV features, enabling globally convergent and computationally efficient PE. Comprehensive experiments on the north campus long-term vision and LiDAR (NCLT) and Oxford datasets show that RING# outperforms state-of-the-art methods in both vision and LiDAR modalities, validating the effectiveness of the proposed approach.\",\"PeriodicalId\":50388,\"journal\":{\"name\":\"IEEE Transactions on Robotics\",\"volume\":\"41 \",\"pages\":\"1861-1881\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2025-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10891747/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891747/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}
RING#: PR-By-PE Global Localization With Roto-Translation Equivariant Gram Learning
Global localization using onboard perception sensors, such as cameras and light detection and ranging (LiDAR) sensors, is crucial in autonomous driving and robotics applications when Global Positioning System (GPS) signals are unreliable. Most approaches achieve global localization by sequential place recognition (PR) and pose estimation (PE). Some methods train separate models for each task, while others employ a single model with dual heads, trained jointly with separate task-specific losses. However, the accuracy of localization heavily depends on the success of PR, which often fails in scenarios with significant changes in viewpoint or environmental appearance. Consequently, this renders the final PE of localization ineffective. To address this, we introduce a new paradigm, PR-by-PE localization, which bypasses the need for separate PR by directly deriving it from PE. We propose RING#, an end-to-end PR-by-PE localization network that operates in the bird's-eye-view (BEV) space, compatible with both vision and LiDAR sensors. RING# incorporates a novel design that learns two equivariant representations from BEV features, enabling globally convergent and computationally efficient PE. Comprehensive experiments on the north campus long-term vision and LiDAR (NCLT) and Oxford datasets show that RING# outperforms state-of-the-art methods in both vision and LiDAR modalities, validating the effectiveness of the proposed approach.
期刊介绍:
The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles.
Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.