用于视觉和语言导航的历史引导提示生成。

IF 10.5 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Cybernetics Pub Date : 2025-10-02 DOI:10.1109/tcyb.2025.3613147

Wen Guo,Zongmeng Wang,Yufan Hu,Junyu Gao

{"title":"用于视觉和语言导航的历史引导提示生成。","authors":"Wen Guo,Zongmeng Wang,Yufan Hu,Junyu Gao","doi":"10.1109/tcyb.2025.3613147","DOIUrl":null,"url":null,"abstract":"Vision-and-language navigation (VLN) has garnered extensive attention in the field of embodied artificial intelligence. VLN involves time series information, where historical observations contain rich contextual knowledge and play a crucial role in navigation. However, current methods do not explicitly excavate the connection between rich contextual information in history and the current environment, and ignore adaptive learning of clues related to the current environment. Therefore, we explore a Prompt Learning-based strategy which adaptively mines information in history that is highly relevant to the current environment to enhance the agent's perception of the current environment and propose a history-guided prompt generation (HGPG) framework. Specifically, HGPG includes two parts, one is an entropy-based history acquisition module that assesses the uncertainty of the action probability distribution from the preceding step to determine whether historical information should be used at the current time step. The other part is the prompt generation module that transforms historical context into prompt vectors by sampling from an end-to-end learned token library. These prompt tokens serve as discrete, knowledge-rich representations that encode semantic cues from historical observations in a compact form, making them easier for the decision network to understand and utilize. In addition, we share the token library across various navigation tasks, mining common features between different tasks to improve generalization to unknown environments. Extensive experimental results on four mainstream VLN benchmarks (R2R, REVERIE, SOON, R2R-CE) demonstrate the effectiveness of our proposed method. Code is available at https://github.com/Wzmshdong/HGPG.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"10 1","pages":""},"PeriodicalIF":10.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"History-Guided Prompt Generation for Vision-and-Language Navigation.\",\"authors\":\"Wen Guo,Zongmeng Wang,Yufan Hu,Junyu Gao\",\"doi\":\"10.1109/tcyb.2025.3613147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision-and-language navigation (VLN) has garnered extensive attention in the field of embodied artificial intelligence. VLN involves time series information, where historical observations contain rich contextual knowledge and play a crucial role in navigation. However, current methods do not explicitly excavate the connection between rich contextual information in history and the current environment, and ignore adaptive learning of clues related to the current environment. Therefore, we explore a Prompt Learning-based strategy which adaptively mines information in history that is highly relevant to the current environment to enhance the agent's perception of the current environment and propose a history-guided prompt generation (HGPG) framework. Specifically, HGPG includes two parts, one is an entropy-based history acquisition module that assesses the uncertainty of the action probability distribution from the preceding step to determine whether historical information should be used at the current time step. The other part is the prompt generation module that transforms historical context into prompt vectors by sampling from an end-to-end learned token library. These prompt tokens serve as discrete, knowledge-rich representations that encode semantic cues from historical observations in a compact form, making them easier for the decision network to understand and utilize. In addition, we share the token library across various navigation tasks, mining common features between different tasks to improve generalization to unknown environments. Extensive experimental results on four mainstream VLN benchmarks (R2R, REVERIE, SOON, R2R-CE) demonstrate the effectiveness of our proposed method. Code is available at https://github.com/Wzmshdong/HGPG.\",\"PeriodicalId\":13112,\"journal\":{\"name\":\"IEEE Transactions on Cybernetics\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cybernetics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tcyb.2025.3613147\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tcyb.2025.3613147","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

视觉语言导航（VLN）在具身人工智能领域受到广泛关注。VLN涉及时间序列信息，其中历史观测包含丰富的背景知识，在导航中起着至关重要的作用。然而，目前的方法并没有明确挖掘历史上丰富的上下文信息与当前环境之间的联系，也忽略了对与当前环境相关的线索的适应性学习。因此，我们探索了一种基于提示学习的策略，该策略自适应地挖掘与当前环境高度相关的历史信息，以增强智能体对当前环境的感知，并提出了历史引导提示生成（HGPG）框架。具体来说，HGPG包括两部分，一部分是基于熵的历史获取模块，评估前一步动作概率分布的不确定性，以确定当前时间步是否应该使用历史信息。另一部分是提示生成模块，该模块通过从端到端学习的令牌库中采样，将历史上下文转换为提示向量。这些提示符号作为离散的、知识丰富的表示，以紧凑的形式编码来自历史观察的语义线索，使决策网络更容易理解和利用它们。此外，我们在各种导航任务之间共享令牌库，挖掘不同任务之间的共同特征，以提高对未知环境的泛化。在四个主流VLN基准（R2R， REVERIE， SOON, R2R- ce）上的大量实验结果证明了我们提出的方法的有效性。代码可从https://github.com/Wzmshdong/HGPG获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

History-Guided Prompt Generation for Vision-and-Language Navigation.

Vision-and-language navigation (VLN) has garnered extensive attention in the field of embodied artificial intelligence. VLN involves time series information, where historical observations contain rich contextual knowledge and play a crucial role in navigation. However, current methods do not explicitly excavate the connection between rich contextual information in history and the current environment, and ignore adaptive learning of clues related to the current environment. Therefore, we explore a Prompt Learning-based strategy which adaptively mines information in history that is highly relevant to the current environment to enhance the agent's perception of the current environment and propose a history-guided prompt generation (HGPG) framework. Specifically, HGPG includes two parts, one is an entropy-based history acquisition module that assesses the uncertainty of the action probability distribution from the preceding step to determine whether historical information should be used at the current time step. The other part is the prompt generation module that transforms historical context into prompt vectors by sampling from an end-to-end learned token library. These prompt tokens serve as discrete, knowledge-rich representations that encode semantic cues from historical observations in a compact form, making them easier for the decision network to understand and utilize. In addition, we share the token library across various navigation tasks, mining common features between different tasks to improve generalization to unknown environments. Extensive experimental results on four mainstream VLN benchmarks (R2R, REVERIE, SOON, R2R-CE) demonstrate the effectiveness of our proposed method. Code is available at https://github.com/Wzmshdong/HGPG.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

25.40

自引率

11.00%

发文量

1869

期刊介绍： The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.