{"title":"Multi-View Spatial Context and State Constraints for Object-Goal Navigation","authors":"Chong Lu;Meiqin Liu;Zhirong Luan;Yan He;Badong Chen","doi":"10.1109/LRA.2025.3529324","DOIUrl":null,"url":null,"abstract":"Object-goal navigation is a highly challenging task where an agent must navigate to a target solely based on visual observations. Current reinforcement learning-based methods for object-goal navigation face two major challenges: first, the agent lacks sufficient perception of environmental context information, resulting in an absence of rich visual representations; second, in complex environments or confined spaces, the agent tends to stop exploring novel states, becoming trapped in a deadlock from which it cannot escape. To address these issues, we propose a novel Multi-View Visual Transformer (MVVT) navigation model, which consists of two components: a multi-view visual observation representation module and an episode state constraint-based policy learning module. In the visual observation representation module, we expand the input image perspective to five views to enable the agent to learn rich spatial context relationships of the environment, which provides content-rich feature information for subsequent policy learning. In the policy learning module, we help the agent escape deadlock by constraining the correlation of highly related states within an episode, which promotes the exploration of novel states and achieves efficient navigation. We validate our method in the AI2-Thor environment, and experimental results show that our approach outperforms current state-of-the-art methods across all metrics, with a particularly notable improvement in success rate by 2.66% and SPL metric by 16.5%.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2207-2214"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10839297/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Object-goal navigation is a highly challenging task where an agent must navigate to a target solely based on visual observations. Current reinforcement learning-based methods for object-goal navigation face two major challenges: first, the agent lacks sufficient perception of environmental context information, resulting in an absence of rich visual representations; second, in complex environments or confined spaces, the agent tends to stop exploring novel states, becoming trapped in a deadlock from which it cannot escape. To address these issues, we propose a novel Multi-View Visual Transformer (MVVT) navigation model, which consists of two components: a multi-view visual observation representation module and an episode state constraint-based policy learning module. In the visual observation representation module, we expand the input image perspective to five views to enable the agent to learn rich spatial context relationships of the environment, which provides content-rich feature information for subsequent policy learning. In the policy learning module, we help the agent escape deadlock by constraining the correlation of highly related states within an episode, which promotes the exploration of novel states and achieves efficient navigation. We validate our method in the AI2-Thor environment, and experimental results show that our approach outperforms current state-of-the-art methods across all metrics, with a particularly notable improvement in success rate by 2.66% and SPL metric by 16.5%.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.