HOZ++: Versatile Hierarchical Object-to-Zone Graph for Object Navigation

Sixian Zhang;Xinhang Song;Xinyao Yu;Yubing Bai;Xinlong Guo;Weijie Li;Shuqiang Jiang
{"title":"HOZ++: Versatile Hierarchical Object-to-Zone Graph for Object Navigation","authors":"Sixian Zhang;Xinhang Song;Xinyao Yu;Yubing Bai;Xinlong Guo;Weijie Li;Shuqiang Jiang","doi":"10.1109/TPAMI.2025.3552987","DOIUrl":null,"url":null,"abstract":"The goal of object navigation task is to reach the expected objects using visual information in unseen environments. Previous works typically implement deep models as agents that are trained to predict actions based on visual observations. Despite extensive training, agents often fail to make wise decisions when navigating in unseen environments toward invisible targets. In contrast, humans demonstrate a remarkable talent to navigate toward targets even in unseen environments. This superior capability is attributed to the cognitive map in the hippocampus, which enables humans to recall past experiences in similar situations and anticipate future occurrences during navigation. It is also dynamically updated with new observations from unseen environments. The cognitive map equips humans with a wealth of prior knowledge, significantly enhancing their navigation capabilities. Inspired by human navigation mechanisms, we propose the Hierarchical Object-to-Zone (HOZ++) graph, which encapsulates the regularities among objects, zones, and scenes. The HOZ++ graph helps the agent to identify the current zone and the target zone, and computes an optimal path between them, then selects the next zone along the path as the guidance for the agent. Moreover, the HOZ++ graph continuously updates based on real-time observations in new environments, thereby enhancing its adaptability to new environments. Our HOZ++ graph is versatile and can be integrated into existing methods, including end-to-end RL and modular methods. Our method is evaluated across four simulators, including AI2-THOR, RoboTHOR, Gibson, and Matterport 3D. Additionally, we build a realistic environment to evaluate our method in the real world. Experimental results demonstrate the effectiveness and efficiency of our proposed method.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5958-5975"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10933537/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The goal of object navigation task is to reach the expected objects using visual information in unseen environments. Previous works typically implement deep models as agents that are trained to predict actions based on visual observations. Despite extensive training, agents often fail to make wise decisions when navigating in unseen environments toward invisible targets. In contrast, humans demonstrate a remarkable talent to navigate toward targets even in unseen environments. This superior capability is attributed to the cognitive map in the hippocampus, which enables humans to recall past experiences in similar situations and anticipate future occurrences during navigation. It is also dynamically updated with new observations from unseen environments. The cognitive map equips humans with a wealth of prior knowledge, significantly enhancing their navigation capabilities. Inspired by human navigation mechanisms, we propose the Hierarchical Object-to-Zone (HOZ++) graph, which encapsulates the regularities among objects, zones, and scenes. The HOZ++ graph helps the agent to identify the current zone and the target zone, and computes an optimal path between them, then selects the next zone along the path as the guidance for the agent. Moreover, the HOZ++ graph continuously updates based on real-time observations in new environments, thereby enhancing its adaptability to new environments. Our HOZ++ graph is versatile and can be integrated into existing methods, including end-to-end RL and modular methods. Our method is evaluated across four simulators, including AI2-THOR, RoboTHOR, Gibson, and Matterport 3D. Additionally, we build a realistic environment to evaluate our method in the real world. Experimental results demonstrate the effectiveness and efficiency of our proposed method.
用于对象导航的通用分层对象到区域图
目标导航任务的目标是在不可见的环境中利用视觉信息到达预期的目标。以前的作品通常将深度模型作为智能体来实现,这些智能体被训练以根据视觉观察来预测动作。尽管进行了大量的训练,但智能体在不可见的环境中朝着不可见的目标导航时,往往无法做出明智的决定。相比之下,人类表现出一种非凡的才能,即使在看不见的环境中也能找到目标。这种优越的能力归因于海马体中的认知地图,它使人类能够回忆过去在类似情况下的经历,并在导航过程中预测未来的情况。它还会根据来自未知环境的新观测动态更新。认知地图为人类提供了丰富的先验知识,显著提高了人类的导航能力。受人类导航机制的启发,我们提出了分层对象到区域(hoz++)图,它封装了对象、区域和场景之间的规律。hoz++图形帮助智能体识别当前区域和目标区域,并计算出它们之间的最优路径,然后沿着路径选择下一个区域作为智能体的引导。此外,hoz++图形在新环境下根据实时观测不断更新,增强了对新环境的适应能力。我们的hoz++图形是通用的,可以集成到现有的方法中,包括端到端RL和模块化方法。我们的方法在四个模拟器上进行了评估,包括AI2-THOR, RoboTHOR, Gibson和Matterport 3D。此外,我们建立了一个现实的环境来评估我们的方法在现实世界中。实验结果证明了该方法的有效性和高效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信