ForestVO：通过ForestGlue增强森林环境中的视觉里程测量

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-04-03 DOI:10.1109/LRA.2025.3557738

Thomas Pritchard;Saifullah Ijaz;Ronald Clark;Basaran Bahadir Kocer

{"title":"ForestVO：通过ForestGlue增强森林环境中的视觉里程测量","authors":"Thomas Pritchard;Saifullah Ijaz;Ronald Clark;Basaran Bahadir Kocer","doi":"10.1109/LRA.2025.3557738","DOIUrl":null,"url":null,"abstract":"Recent advancements in visual odometry systems have improved autonomous navigation, yet challenges persist in complex environments like forests, where dense foliage, variable lighting, and repetitive textures compromise the accuracy of feature correspondences. To address these challenges, we introduce ForestGlue. ForestGlue enhances the SuperPoint feature detector through four configurations – grayscale, RGB, RGB-D, and stereo-vision inputs – optimised for various sensing modalities. For feature matching, we employ LightGlue or SuperGlue, both of which have been retrained using synthetic forest data. ForestGlue achieves comparable pose estimation accuracy to baseline LightGlue and SuperGlue models, yet require only 512 keypoints, just 25% of the 2048 keypoints used by baseline models, to achieve an LO-RANSAC AUC score of 0.745 at a 10° threshold. With a 1/4 of the keypoints required, ForestGlue has the potential to reduce computational overhead whilst being effective in dynamic forest environments, making it a promising candidate for real-time deployment on resource-constrained platforms such as drones or mobile robotic platforms. By combining ForestGlue with a novel transformer based pose estimation model, we propose ForestVO, which estimates relative camera poses using the 2D pixel coordinates of matched features between frames. On challenging TartanAir forest sequences, ForestVO achieves an average relative pose error (RPE) of 1.09 m and kitti_score of 2.33%, outperforming direct-based methods such as DSO in dynamic scenes by 40%, while maintaining competitive performance with TartanVO despite being a significantly lighter model trained on only 10% of the dataset. This work establishes an end-to-end deep learning pipeline tailored for visual odometry in forested environments, leveraging forest-specific training data to optimise feature correspondence and pose estimation for improved accuracy and robustness in autonomous navigation systems.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 6","pages":"5233-5240"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ForestVO: Enhancing Visual Odometry in Forest Environments Through ForestGlue\",\"authors\":\"Thomas Pritchard;Saifullah Ijaz;Ronald Clark;Basaran Bahadir Kocer\",\"doi\":\"10.1109/LRA.2025.3557738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advancements in visual odometry systems have improved autonomous navigation, yet challenges persist in complex environments like forests, where dense foliage, variable lighting, and repetitive textures compromise the accuracy of feature correspondences. To address these challenges, we introduce ForestGlue. ForestGlue enhances the SuperPoint feature detector through four configurations – grayscale, RGB, RGB-D, and stereo-vision inputs – optimised for various sensing modalities. For feature matching, we employ LightGlue or SuperGlue, both of which have been retrained using synthetic forest data. ForestGlue achieves comparable pose estimation accuracy to baseline LightGlue and SuperGlue models, yet require only 512 keypoints, just 25% of the 2048 keypoints used by baseline models, to achieve an LO-RANSAC AUC score of 0.745 at a 10° threshold. With a 1/4 of the keypoints required, ForestGlue has the potential to reduce computational overhead whilst being effective in dynamic forest environments, making it a promising candidate for real-time deployment on resource-constrained platforms such as drones or mobile robotic platforms. By combining ForestGlue with a novel transformer based pose estimation model, we propose ForestVO, which estimates relative camera poses using the 2D pixel coordinates of matched features between frames. On challenging TartanAir forest sequences, ForestVO achieves an average relative pose error (RPE) of 1.09 m and kitti_score of 2.33%, outperforming direct-based methods such as DSO in dynamic scenes by 40%, while maintaining competitive performance with TartanVO despite being a significantly lighter model trained on only 10% of the dataset. This work establishes an end-to-end deep learning pipeline tailored for visual odometry in forested environments, leveraging forest-specific training data to optimise feature correspondence and pose estimation for improved accuracy and robustness in autonomous navigation systems.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 6\",\"pages\":\"5233-5240\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10948308/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10948308/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

视觉里程计系统的最新进展改善了自主导航，但在森林等复杂环境中仍然存在挑战，在森林中，茂密的树叶，多变的照明和重复的纹理会损害特征对应的准确性。为了应对这些挑战，我们推出了ForestGlue。ForestGlue通过四种配置增强了SuperPoint特征检测器-灰度，RGB， RGB- d和立体视觉输入-优化了各种传感模式。对于特征匹配，我们使用了LightGlue或SuperGlue，两者都使用合成森林数据进行了再训练。ForestGlue实现了与基线LightGlue和SuperGlue模型相当的姿态估计精度，但只需要512个关键点，仅为基线模型使用的2048个关键点的25%，在10°阈值下实现了0.745的LO-RANSAC AUC分数。只需1/4的关键点，ForestGlue就有可能减少计算开销，同时在动态森林环境中有效，使其成为在无人机或移动机器人平台等资源受限平台上实时部署的有希望的候选者。通过将ForestGlue与一种新的基于变形的姿态估计模型相结合，我们提出了ForestVO，该模型使用帧间匹配特征的二维像素坐标来估计相对相机姿态。在具有挑战性的TartanAir森林序列中，ForestVO实现了1.09 m的平均相对姿态误差（RPE）和2.33%的kitti_score，在动态场景中比DSO等直接基于的方法高出40%，同时保持了与TartanVO竞争的性能，尽管它是一个只在10%的数据集上训练过的更轻的模型。这项工作建立了一个端到端的深度学习管道，为森林环境中的视觉里程测量量身定制，利用特定于森林的训练数据来优化特征对应和姿态估计，以提高自主导航系统的准确性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ForestVO: Enhancing Visual Odometry in Forest Environments Through ForestGlue

Recent advancements in visual odometry systems have improved autonomous navigation, yet challenges persist in complex environments like forests, where dense foliage, variable lighting, and repetitive textures compromise the accuracy of feature correspondences. To address these challenges, we introduce ForestGlue. ForestGlue enhances the SuperPoint feature detector through four configurations – grayscale, RGB, RGB-D, and stereo-vision inputs – optimised for various sensing modalities. For feature matching, we employ LightGlue or SuperGlue, both of which have been retrained using synthetic forest data. ForestGlue achieves comparable pose estimation accuracy to baseline LightGlue and SuperGlue models, yet require only 512 keypoints, just 25% of the 2048 keypoints used by baseline models, to achieve an LO-RANSAC AUC score of 0.745 at a 10° threshold. With a 1/4 of the keypoints required, ForestGlue has the potential to reduce computational overhead whilst being effective in dynamic forest environments, making it a promising candidate for real-time deployment on resource-constrained platforms such as drones or mobile robotic platforms. By combining ForestGlue with a novel transformer based pose estimation model, we propose ForestVO, which estimates relative camera poses using the 2D pixel coordinates of matched features between frames. On challenging TartanAir forest sequences, ForestVO achieves an average relative pose error (RPE) of 1.09 m and kitti_score of 2.33%, outperforming direct-based methods such as DSO in dynamic scenes by 40%, while maintaining competitive performance with TartanVO despite being a significantly lighter model trained on only 10% of the dataset. This work establishes an end-to-end deep learning pipeline tailored for visual odometry in forested environments, leveraging forest-specific training data to optimise feature correspondence and pose estimation for improved accuracy and robustness in autonomous navigation systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.