Xuan Wang , Yuan Zhuang , Xiaoxiang Cao , Jianzhu Huai , Zhenghua Zhang , Zhenqi Zheng , Naser El-Sheimy
{"title":"GAT-LSTM: A feature point management network with graph attention for feature-based visual SLAM in dynamic environments","authors":"Xuan Wang , Yuan Zhuang , Xiaoxiang Cao , Jianzhu Huai , Zhenghua Zhang , Zhenqi Zheng , Naser El-Sheimy","doi":"10.1016/j.isprsjprs.2025.03.011","DOIUrl":null,"url":null,"abstract":"<div><div>Visual simultaneous localization and mapping (vSLAM) is crucial in various applications, ranging from robotics to augmented reality. However, dynamic environments cause difficulty to vSLAM, which often relies on extracted feature points (FPs). Effectively managing FPs in dynamic environments poses a significant challenge. To address this challenge, we propose an innovative solution that leverages a graph attention network (GAT) integrated into a long- short-term memory (LSTM) network, enabling the system to prioritize attention on these stable FPs. The GAT component extracts spatial structural information from individual image FPs, which are graph nodes, thereby modeling the local relationship of each FP. Meanwhile, the LSTM module facilitates the local association’s consistent temporal feature analysis. Our approach effectively blends local relationship modeling with global consistency analysis, presenting the first application of GAT-LSTM to tackle the complexities introduced by dynamic and error-tracking FPs. Additionally, we introduce a backpropagating epipolar geometry solver to address this non-back propagatable optimization module in a deep neural network. Moreover, monocular vSLAM cannot directly measure distances and typically depends on reference objects or motion information. Depth estimation is complex and error-prone due to texture deficiency and motion blur. Thus, we present a dense depth estimation approach to mitigate the challenges associated with depth estimation by leveraging the selected stable FPs and a depth estimation network. We validated the GAT-LSTM network within a purely Visual Odometry (VO) framework and a Visual-Inertial Odometer (VIO) using the KITTI, VIODE, and in-house datasets. These experiments demonstrated that the exclusion of dynamic and error-tracking FPs using GAT-LSTM significantly enhances odometry accuracy and robustness. Compared to existing methods, the root-mean-square error of absolute pose error decreased by 4.52%–76.86% in VO and by 9.09%–96.94% in VIO. Our practice offers valuable insights and potential applications for more robust and accurate vSLAM and other related fields, highlighting the benefits of integrating GAT and LSTM networks.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"224 ","pages":"Pages 75-93"},"PeriodicalIF":10.6000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625001091","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
GAT-LSTM: A feature point management network with graph attention for feature-based visual SLAM in dynamic environments
Visual simultaneous localization and mapping (vSLAM) is crucial in various applications, ranging from robotics to augmented reality. However, dynamic environments cause difficulty to vSLAM, which often relies on extracted feature points (FPs). Effectively managing FPs in dynamic environments poses a significant challenge. To address this challenge, we propose an innovative solution that leverages a graph attention network (GAT) integrated into a long- short-term memory (LSTM) network, enabling the system to prioritize attention on these stable FPs. The GAT component extracts spatial structural information from individual image FPs, which are graph nodes, thereby modeling the local relationship of each FP. Meanwhile, the LSTM module facilitates the local association’s consistent temporal feature analysis. Our approach effectively blends local relationship modeling with global consistency analysis, presenting the first application of GAT-LSTM to tackle the complexities introduced by dynamic and error-tracking FPs. Additionally, we introduce a backpropagating epipolar geometry solver to address this non-back propagatable optimization module in a deep neural network. Moreover, monocular vSLAM cannot directly measure distances and typically depends on reference objects or motion information. Depth estimation is complex and error-prone due to texture deficiency and motion blur. Thus, we present a dense depth estimation approach to mitigate the challenges associated with depth estimation by leveraging the selected stable FPs and a depth estimation network. We validated the GAT-LSTM network within a purely Visual Odometry (VO) framework and a Visual-Inertial Odometer (VIO) using the KITTI, VIODE, and in-house datasets. These experiments demonstrated that the exclusion of dynamic and error-tracking FPs using GAT-LSTM significantly enhances odometry accuracy and robustness. Compared to existing methods, the root-mean-square error of absolute pose error decreased by 4.52%–76.86% in VO and by 9.09%–96.94% in VIO. Our practice offers valuable insights and potential applications for more robust and accurate vSLAM and other related fields, highlighting the benefits of integrating GAT and LSTM networks.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.