SLAM2:针对室内动态环境的同步定位和多模绘图

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhihao Lin , Qi Zhang , Zhen Tian , Peizhuo Yu , Ziyang Ye , Hanyang Zhuang , Jianglin Lan
{"title":"SLAM2:针对室内动态环境的同步定位和多模绘图","authors":"Zhihao Lin ,&nbsp;Qi Zhang ,&nbsp;Zhen Tian ,&nbsp;Peizhuo Yu ,&nbsp;Ziyang Ye ,&nbsp;Hanyang Zhuang ,&nbsp;Jianglin Lan","doi":"10.1016/j.patcog.2024.111054","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111054"},"PeriodicalIF":7.5000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SLAM2: Simultaneous Localization and Multimode Mapping for indoor dynamic environments\",\"authors\":\"Zhihao Lin ,&nbsp;Qi Zhang ,&nbsp;Zhen Tian ,&nbsp;Peizhuo Yu ,&nbsp;Ziyang Ye ,&nbsp;Hanyang Zhuang ,&nbsp;Jianglin Lan\",\"doi\":\"10.1016/j.patcog.2024.111054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"158 \",\"pages\":\"Article 111054\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320324008057\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008057","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

传统的基于点特征的视觉同步定位与映射(SLAM)方法往往受限于强大的静态假设和纹理信息,导致相机姿态估计和物体定位不准确。为了应对这些挑战,我们提出了 SLAM2,这是一种新颖的语义 RGB-D SLAM 系统,可以准确估计摄像机的姿态和其他物体的 6DOF 姿态,从而在动态环境中绘制出完整、清晰的静态 3D 模型映射。我们的系统充分利用空间中的点、线、面特征来提高摄像机姿态估计的准确性。它将传统的几何方法与深度学习方法相结合,既能检测场景中已知的动态物体,也能检测未知的动态物体。此外,我们的系统还设计了三种模式的映射方法,包括密集、半密集和稀疏,可根据不同任务的需要选择模式。这使得我们的视觉 SLAM 系统适用于多种应用领域。在 TUM RGB-D 和 Bonn RGB-D 数据集中进行的评估表明,与最先进的方法相比,我们的 SLAM 系统在动态环境中实现了最高的定位精度和最简洁的静态三维场景映射。具体来说,在高动态的 TUM w/half 序列中,我们的系统实现了 0.018 米的均方根误差 (RMSE),优于 ORB-SLAM3(0.231 米)和 DRG-SLAM(0.025 米)。在波恩数据集中,我们的系统在 18 个序列中的 14 个序列中表现优异,与次优方法相比,平均 RMSE 降低了 27.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SLAM2: Simultaneous Localization and Multimode Mapping for indoor dynamic environments
Traditional visual Simultaneous Localization and Mapping (SLAM) methods based on point features are often limited by strong static assumptions and texture information, resulting in inaccurate camera pose estimation and object localization. To address these challenges, we present SLAM2, a novel semantic RGB-D SLAM system that can obtain accurate estimation of the camera pose and the 6DOF pose of other objects, resulting in complete and clean static 3D model mapping in dynamic environments. Our system makes full use of the point, line, and plane features in space to enhance the camera pose estimation accuracy. It combines the traditional geometric method with a deep learning method to detect both known and unknown dynamic objects in the scene. Moreover, our system is designed with a three-mode mapping method, including dense, semi-dense, and sparse, where the mode can be selected according to the needs of different tasks. This makes our visual SLAM system applicable to diverse application areas. Evaluation in the TUM RGB-D and Bonn RGB-D datasets demonstrates that our SLAM system achieves the most advanced localization accuracy and the cleanest static 3D mapping of the scene in dynamic environments, compared to state-of-the-art methods. Specifically, our system achieves a root mean square error (RMSE) of 0.018 m in the highly dynamic TUM w/half sequence, outperforming ORB-SLAM3 (0.231 m) and DRG-SLAM (0.025 m). In the Bonn dataset, our system demonstrates superior performance in 14 out of 18 sequences, with an average RMSE reduction of 27.3% compared to the next best method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信