HOZ++: Versatile Hierarchical Object-to-Zone Graph for Object Navigation

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-03-19 DOI:10.1109/TPAMI.2025.3552987

Sixian Zhang;Xinhang Song;Xinyao Yu;Yubing Bai;Xinlong Guo;Weijie Li;Shuqiang Jiang

{"title":"HOZ++: Versatile Hierarchical Object-to-Zone Graph for Object Navigation","authors":"Sixian Zhang;Xinhang Song;Xinyao Yu;Yubing Bai;Xinlong Guo;Weijie Li;Shuqiang Jiang","doi":"10.1109/TPAMI.2025.3552987","DOIUrl":null,"url":null,"abstract":"The goal of object navigation task is to reach the expected objects using visual information in unseen environments. Previous works typically implement deep models as agents that are trained to predict actions based on visual observations. Despite extensive training, agents often fail to make wise decisions when navigating in unseen environments toward invisible targets. In contrast, humans demonstrate a remarkable talent to navigate toward targets even in unseen environments. This superior capability is attributed to the cognitive map in the hippocampus, which enables humans to recall past experiences in similar situations and anticipate future occurrences during navigation. It is also dynamically updated with new observations from unseen environments. The cognitive map equips humans with a wealth of prior knowledge, significantly enhancing their navigation capabilities. Inspired by human navigation mechanisms, we propose the Hierarchical Object-to-Zone (HOZ++) graph, which encapsulates the regularities among objects, zones, and scenes. The HOZ++ graph helps the agent to identify the current zone and the target zone, and computes an optimal path between them, then selects the next zone along the path as the guidance for the agent. Moreover, the HOZ++ graph continuously updates based on real-time observations in new environments, thereby enhancing its adaptability to new environments. Our HOZ++ graph is versatile and can be integrated into existing methods, including end-to-end RL and modular methods. Our method is evaluated across four simulators, including AI2-THOR, RoboTHOR, Gibson, and Matterport 3D. Additionally, we build a realistic environment to evaluate our method in the real world. Experimental results demonstrate the effectiveness and efficiency of our proposed method.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 7","pages":"5958-5975"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10933537/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The goal of object navigation task is to reach the expected objects using visual information in unseen environments. Previous works typically implement deep models as agents that are trained to predict actions based on visual observations. Despite extensive training, agents often fail to make wise decisions when navigating in unseen environments toward invisible targets. In contrast, humans demonstrate a remarkable talent to navigate toward targets even in unseen environments. This superior capability is attributed to the cognitive map in the hippocampus, which enables humans to recall past experiences in similar situations and anticipate future occurrences during navigation. It is also dynamically updated with new observations from unseen environments. The cognitive map equips humans with a wealth of prior knowledge, significantly enhancing their navigation capabilities. Inspired by human navigation mechanisms, we propose the Hierarchical Object-to-Zone (HOZ++) graph, which encapsulates the regularities among objects, zones, and scenes. The HOZ++ graph helps the agent to identify the current zone and the target zone, and computes an optimal path between them, then selects the next zone along the path as the guidance for the agent. Moreover, the HOZ++ graph continuously updates based on real-time observations in new environments, thereby enhancing its adaptability to new environments. Our HOZ++ graph is versatile and can be integrated into existing methods, including end-to-end RL and modular methods. Our method is evaluated across four simulators, including AI2-THOR, RoboTHOR, Gibson, and Matterport 3D. Additionally, we build a realistic environment to evaluate our method in the real world. Experimental results demonstrate the effectiveness and efficiency of our proposed method.

查看原文本刊更多论文

用于对象导航的通用分层对象到区域图

目标导航任务的目标是在不可见的环境中利用视觉信息到达预期的目标。以前的作品通常将深度模型作为智能体来实现，这些智能体被训练以根据视觉观察来预测动作。尽管进行了大量的训练，但智能体在不可见的环境中朝着不可见的目标导航时，往往无法做出明智的决定。相比之下，人类表现出一种非凡的才能，即使在看不见的环境中也能找到目标。这种优越的能力归因于海马体中的认知地图，它使人类能够回忆过去在类似情况下的经历，并在导航过程中预测未来的情况。它还会根据来自未知环境的新观测动态更新。认知地图为人类提供了丰富的先验知识，显著提高了人类的导航能力。受人类导航机制的启发，我们提出了分层对象到区域（hoz++）图，它封装了对象、区域和场景之间的规律。hoz++图形帮助智能体识别当前区域和目标区域，并计算出它们之间的最优路径，然后沿着路径选择下一个区域作为智能体的引导。此外，hoz++图形在新环境下根据实时观测不断更新，增强了对新环境的适应能力。我们的hoz++图形是通用的，可以集成到现有的方法中，包括端到端RL和模块化方法。我们的方法在四个模拟器上进行了评估，包括AI2-THOR， RoboTHOR， Gibson和Matterport 3D。此外，我们建立了一个现实的环境来评估我们的方法在现实世界中。实验结果证明了该方法的有效性和高效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量