帮助我：基于模仿学习的主动视图规划避免SLAM跟踪失败

IF 10.5 1区计算机科学 Q1 ROBOTICS

IEEE Transactions on Robotics Pub Date : 2025-06-24 DOI:10.1109/TRO.2025.3582817

Kanwal Naveed;Wajahat Hussain;Irfan Hussain;Donghwan Lee;Muhammad Latif Anjum

{"title":"帮助我：基于模仿学习的主动视图规划避免SLAM跟踪失败","authors":"Kanwal Naveed;Wajahat Hussain;Irfan Hussain;Donghwan Lee;Muhammad Latif Anjum","doi":"10.1109/TRO.2025.3582817","DOIUrl":null,"url":null,"abstract":"Large-scale evaluation of state-of-the-art visual simultaneous localization and mapping (SLAM) has shown that its tracking performance degrades considerably if the camera view is not adjusted to avoid the low-texture areas. Deep reinforcement learning (RL)-based approaches have been proposed to improve the robustness of visual tracking in such unsupervised settings. Our extensive analysis reveals the fundamental limitations of RL-based active view planning, especially in transition scenarios (entering/exiting the room, texture-less walls, and lobbies). In challenging transition scenarios, the agent generally remains unable to cross the transition during training, limiting its ability to learn the maneuver. We propose human-supervised RL training (imitation learning) and achieve significantly improved performance after <inline-formula><tex-math>$\\sim$</tex-math></inline-formula>50 h of supervised training. To reduce longer human supervision requirements, we also explore fine-tuning our network with an online learning policy. Here, we use limited human-supervised training (<inline-formula><tex-math>$\\sim$</tex-math></inline-formula>20 h), and fine-tune the network with unsupervised training (<inline-formula><tex-math>$\\sim$</tex-math></inline-formula>45 h), obtaining encouraging results. We also release our multimodel, human supervised training dataset. The dataset contains challenging and diverse transition scenarios and can aid the development of imitation learning policies for consistent visual tracking. We also release our implementation.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"4236-4252"},"PeriodicalIF":10.5000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Help Me Through: Imitation Learning Based Active View Planning to Avoid SLAM Tracking Failures\",\"authors\":\"Kanwal Naveed;Wajahat Hussain;Irfan Hussain;Donghwan Lee;Muhammad Latif Anjum\",\"doi\":\"10.1109/TRO.2025.3582817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale evaluation of state-of-the-art visual simultaneous localization and mapping (SLAM) has shown that its tracking performance degrades considerably if the camera view is not adjusted to avoid the low-texture areas. Deep reinforcement learning (RL)-based approaches have been proposed to improve the robustness of visual tracking in such unsupervised settings. Our extensive analysis reveals the fundamental limitations of RL-based active view planning, especially in transition scenarios (entering/exiting the room, texture-less walls, and lobbies). In challenging transition scenarios, the agent generally remains unable to cross the transition during training, limiting its ability to learn the maneuver. We propose human-supervised RL training (imitation learning) and achieve significantly improved performance after <inline-formula><tex-math>$\\\\sim$</tex-math></inline-formula>50 h of supervised training. To reduce longer human supervision requirements, we also explore fine-tuning our network with an online learning policy. Here, we use limited human-supervised training (<inline-formula><tex-math>$\\\\sim$</tex-math></inline-formula>20 h), and fine-tune the network with unsupervised training (<inline-formula><tex-math>$\\\\sim$</tex-math></inline-formula>45 h), obtaining encouraging results. We also release our multimodel, human supervised training dataset. The dataset contains challenging and diverse transition scenarios and can aid the development of imitation learning policies for consistent visual tracking. We also release our implementation.\",\"PeriodicalId\":50388,\"journal\":{\"name\":\"IEEE Transactions on Robotics\",\"volume\":\"41 \",\"pages\":\"4236-4252\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11049022/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11049022/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

对当前最先进的视觉同步定位和映射（SLAM）进行的大规模评估表明，如果不调整相机视图以避开低纹理区域，其跟踪性能将大大降低。人们提出了基于深度强化学习（RL）的方法来提高这种无监督环境下视觉跟踪的鲁棒性。我们的广泛分析揭示了基于强化学习的主动视野规划的基本局限性，特别是在过渡场景（进入/退出房间，无纹理的墙壁和大厅）。在具有挑战性的过渡场景中，智能体在训练期间通常仍然无法跨越过渡，限制了其学习机动的能力。我们提出了人类监督的RL训练（模仿学习），并在50小时的监督训练后显著提高了性能。为了减少长时间的人工监督需求，我们还探索了用在线学习策略微调我们的网络。在这里，我们使用有限的人类监督训练（$\sim$20小时），并使用无监督训练（$\sim$45小时）微调网络，获得了令人鼓舞的结果。我们还发布了我们的多模型、人类监督的训练数据集。该数据集包含具有挑战性和多样化的转换场景，可以帮助开发一致视觉跟踪的模仿学习策略。我们还发布了我们的实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Help Me Through: Imitation Learning Based Active View Planning to Avoid SLAM Tracking Failures

Large-scale evaluation of state-of-the-art visual simultaneous localization and mapping (SLAM) has shown that its tracking performance degrades considerably if the camera view is not adjusted to avoid the low-texture areas. Deep reinforcement learning (RL)-based approaches have been proposed to improve the robustness of visual tracking in such unsupervised settings. Our extensive analysis reveals the fundamental limitations of RL-based active view planning, especially in transition scenarios (entering/exiting the room, texture-less walls, and lobbies). In challenging transition scenarios, the agent generally remains unable to cross the transition during training, limiting its ability to learn the maneuver. We propose human-supervised RL training (imitation learning) and achieve significantly improved performance after

$\sim$

50 h of supervised training. To reduce longer human supervision requirements, we also explore fine-tuning our network with an online learning policy. Here, we use limited human-supervised training (

$\sim$

20 h), and fine-tune the network with unsupervised training (

$\sim$

45 h), obtaining encouraging results. We also release our multimodel, human supervised training dataset. The dataset contains challenging and diverse transition scenarios and can aid the development of imitation learning policies for consistent visual tracking. We also release our implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Robotics 工程技术-机器人学

CiteScore

14.90

自引率

5.10%

发文量

259

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles. Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.