Multi-View Spatial Context and State Constraints for Object-Goal Navigation

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-01-13 DOI:10.1109/LRA.2025.3529324

Chong Lu;Meiqin Liu;Zhirong Luan;Yan He;Badong Chen

{"title":"Multi-View Spatial Context and State Constraints for Object-Goal Navigation","authors":"Chong Lu;Meiqin Liu;Zhirong Luan;Yan He;Badong Chen","doi":"10.1109/LRA.2025.3529324","DOIUrl":null,"url":null,"abstract":"Object-goal navigation is a highly challenging task where an agent must navigate to a target solely based on visual observations. Current reinforcement learning-based methods for object-goal navigation face two major challenges: first, the agent lacks sufficient perception of environmental context information, resulting in an absence of rich visual representations; second, in complex environments or confined spaces, the agent tends to stop exploring novel states, becoming trapped in a deadlock from which it cannot escape. To address these issues, we propose a novel Multi-View Visual Transformer (MVVT) navigation model, which consists of two components: a multi-view visual observation representation module and an episode state constraint-based policy learning module. In the visual observation representation module, we expand the input image perspective to five views to enable the agent to learn rich spatial context relationships of the environment, which provides content-rich feature information for subsequent policy learning. In the policy learning module, we help the agent escape deadlock by constraining the correlation of highly related states within an episode, which promotes the exploration of novel states and achieves efficient navigation. We validate our method in the AI2-Thor environment, and experimental results show that our approach outperforms current state-of-the-art methods across all metrics, with a particularly notable improvement in success rate by 2.66% and SPL metric by 16.5%.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 3","pages":"2207-2214"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10839297/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Object-goal navigation is a highly challenging task where an agent must navigate to a target solely based on visual observations. Current reinforcement learning-based methods for object-goal navigation face two major challenges: first, the agent lacks sufficient perception of environmental context information, resulting in an absence of rich visual representations; second, in complex environments or confined spaces, the agent tends to stop exploring novel states, becoming trapped in a deadlock from which it cannot escape. To address these issues, we propose a novel Multi-View Visual Transformer (MVVT) navigation model, which consists of two components: a multi-view visual observation representation module and an episode state constraint-based policy learning module. In the visual observation representation module, we expand the input image perspective to five views to enable the agent to learn rich spatial context relationships of the environment, which provides content-rich feature information for subsequent policy learning. In the policy learning module, we help the agent escape deadlock by constraining the correlation of highly related states within an episode, which promotes the exploration of novel states and achieves efficient navigation. We validate our method in the AI2-Thor environment, and experimental results show that our approach outperforms current state-of-the-art methods across all metrics, with a particularly notable improvement in success rate by 2.66% and SPL metric by 16.5%.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.