{"title":"帮助我:基于模仿学习的主动视图规划避免SLAM跟踪失败","authors":"Kanwal Naveed;Wajahat Hussain;Irfan Hussain;Donghwan Lee;Muhammad Latif Anjum","doi":"10.1109/TRO.2025.3582817","DOIUrl":null,"url":null,"abstract":"Large-scale evaluation of state-of-the-art visual simultaneous localization and mapping (SLAM) has shown that its tracking performance degrades considerably if the camera view is not adjusted to avoid the low-texture areas. Deep reinforcement learning (RL)-based approaches have been proposed to improve the robustness of visual tracking in such unsupervised settings. Our extensive analysis reveals the fundamental limitations of RL-based active view planning, especially in transition scenarios (entering/exiting the room, texture-less walls, and lobbies). In challenging transition scenarios, the agent generally remains unable to cross the transition during training, limiting its ability to learn the maneuver. We propose human-supervised RL training (imitation learning) and achieve significantly improved performance after <inline-formula><tex-math>$\\sim$</tex-math></inline-formula>50 h of supervised training. To reduce longer human supervision requirements, we also explore fine-tuning our network with an online learning policy. Here, we use limited human-supervised training (<inline-formula><tex-math>$\\sim$</tex-math></inline-formula>20 h), and fine-tune the network with unsupervised training (<inline-formula><tex-math>$\\sim$</tex-math></inline-formula>45 h), obtaining encouraging results. We also release our multimodel, human supervised training dataset. The dataset contains challenging and diverse transition scenarios and can aid the development of imitation learning policies for consistent visual tracking. We also release our implementation.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"4236-4252"},"PeriodicalIF":10.5000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Help Me Through: Imitation Learning Based Active View Planning to Avoid SLAM Tracking Failures\",\"authors\":\"Kanwal Naveed;Wajahat Hussain;Irfan Hussain;Donghwan Lee;Muhammad Latif Anjum\",\"doi\":\"10.1109/TRO.2025.3582817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale evaluation of state-of-the-art visual simultaneous localization and mapping (SLAM) has shown that its tracking performance degrades considerably if the camera view is not adjusted to avoid the low-texture areas. Deep reinforcement learning (RL)-based approaches have been proposed to improve the robustness of visual tracking in such unsupervised settings. Our extensive analysis reveals the fundamental limitations of RL-based active view planning, especially in transition scenarios (entering/exiting the room, texture-less walls, and lobbies). In challenging transition scenarios, the agent generally remains unable to cross the transition during training, limiting its ability to learn the maneuver. We propose human-supervised RL training (imitation learning) and achieve significantly improved performance after <inline-formula><tex-math>$\\\\sim$</tex-math></inline-formula>50 h of supervised training. To reduce longer human supervision requirements, we also explore fine-tuning our network with an online learning policy. Here, we use limited human-supervised training (<inline-formula><tex-math>$\\\\sim$</tex-math></inline-formula>20 h), and fine-tune the network with unsupervised training (<inline-formula><tex-math>$\\\\sim$</tex-math></inline-formula>45 h), obtaining encouraging results. We also release our multimodel, human supervised training dataset. The dataset contains challenging and diverse transition scenarios and can aid the development of imitation learning policies for consistent visual tracking. We also release our implementation.\",\"PeriodicalId\":50388,\"journal\":{\"name\":\"IEEE Transactions on Robotics\",\"volume\":\"41 \",\"pages\":\"4236-4252\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11049022/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11049022/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}
Help Me Through: Imitation Learning Based Active View Planning to Avoid SLAM Tracking Failures
Large-scale evaluation of state-of-the-art visual simultaneous localization and mapping (SLAM) has shown that its tracking performance degrades considerably if the camera view is not adjusted to avoid the low-texture areas. Deep reinforcement learning (RL)-based approaches have been proposed to improve the robustness of visual tracking in such unsupervised settings. Our extensive analysis reveals the fundamental limitations of RL-based active view planning, especially in transition scenarios (entering/exiting the room, texture-less walls, and lobbies). In challenging transition scenarios, the agent generally remains unable to cross the transition during training, limiting its ability to learn the maneuver. We propose human-supervised RL training (imitation learning) and achieve significantly improved performance after $\sim$50 h of supervised training. To reduce longer human supervision requirements, we also explore fine-tuning our network with an online learning policy. Here, we use limited human-supervised training ($\sim$20 h), and fine-tune the network with unsupervised training ($\sim$45 h), obtaining encouraging results. We also release our multimodel, human supervised training dataset. The dataset contains challenging and diverse transition scenarios and can aid the development of imitation learning policies for consistent visual tracking. We also release our implementation.
期刊介绍:
The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles.
Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.