{"title":"用于视觉和语言导航的历史引导提示生成。","authors":"Wen Guo,Zongmeng Wang,Yufan Hu,Junyu Gao","doi":"10.1109/tcyb.2025.3613147","DOIUrl":null,"url":null,"abstract":"Vision-and-language navigation (VLN) has garnered extensive attention in the field of embodied artificial intelligence. VLN involves time series information, where historical observations contain rich contextual knowledge and play a crucial role in navigation. However, current methods do not explicitly excavate the connection between rich contextual information in history and the current environment, and ignore adaptive learning of clues related to the current environment. Therefore, we explore a Prompt Learning-based strategy which adaptively mines information in history that is highly relevant to the current environment to enhance the agent's perception of the current environment and propose a history-guided prompt generation (HGPG) framework. Specifically, HGPG includes two parts, one is an entropy-based history acquisition module that assesses the uncertainty of the action probability distribution from the preceding step to determine whether historical information should be used at the current time step. The other part is the prompt generation module that transforms historical context into prompt vectors by sampling from an end-to-end learned token library. These prompt tokens serve as discrete, knowledge-rich representations that encode semantic cues from historical observations in a compact form, making them easier for the decision network to understand and utilize. In addition, we share the token library across various navigation tasks, mining common features between different tasks to improve generalization to unknown environments. Extensive experimental results on four mainstream VLN benchmarks (R2R, REVERIE, SOON, R2R-CE) demonstrate the effectiveness of our proposed method. Code is available at https://github.com/Wzmshdong/HGPG.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"10 1","pages":""},"PeriodicalIF":10.5000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"History-Guided Prompt Generation for Vision-and-Language Navigation.\",\"authors\":\"Wen Guo,Zongmeng Wang,Yufan Hu,Junyu Gao\",\"doi\":\"10.1109/tcyb.2025.3613147\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision-and-language navigation (VLN) has garnered extensive attention in the field of embodied artificial intelligence. VLN involves time series information, where historical observations contain rich contextual knowledge and play a crucial role in navigation. However, current methods do not explicitly excavate the connection between rich contextual information in history and the current environment, and ignore adaptive learning of clues related to the current environment. Therefore, we explore a Prompt Learning-based strategy which adaptively mines information in history that is highly relevant to the current environment to enhance the agent's perception of the current environment and propose a history-guided prompt generation (HGPG) framework. Specifically, HGPG includes two parts, one is an entropy-based history acquisition module that assesses the uncertainty of the action probability distribution from the preceding step to determine whether historical information should be used at the current time step. The other part is the prompt generation module that transforms historical context into prompt vectors by sampling from an end-to-end learned token library. These prompt tokens serve as discrete, knowledge-rich representations that encode semantic cues from historical observations in a compact form, making them easier for the decision network to understand and utilize. In addition, we share the token library across various navigation tasks, mining common features between different tasks to improve generalization to unknown environments. Extensive experimental results on four mainstream VLN benchmarks (R2R, REVERIE, SOON, R2R-CE) demonstrate the effectiveness of our proposed method. Code is available at https://github.com/Wzmshdong/HGPG.\",\"PeriodicalId\":13112,\"journal\":{\"name\":\"IEEE Transactions on Cybernetics\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cybernetics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tcyb.2025.3613147\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tcyb.2025.3613147","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
History-Guided Prompt Generation for Vision-and-Language Navigation.
Vision-and-language navigation (VLN) has garnered extensive attention in the field of embodied artificial intelligence. VLN involves time series information, where historical observations contain rich contextual knowledge and play a crucial role in navigation. However, current methods do not explicitly excavate the connection between rich contextual information in history and the current environment, and ignore adaptive learning of clues related to the current environment. Therefore, we explore a Prompt Learning-based strategy which adaptively mines information in history that is highly relevant to the current environment to enhance the agent's perception of the current environment and propose a history-guided prompt generation (HGPG) framework. Specifically, HGPG includes two parts, one is an entropy-based history acquisition module that assesses the uncertainty of the action probability distribution from the preceding step to determine whether historical information should be used at the current time step. The other part is the prompt generation module that transforms historical context into prompt vectors by sampling from an end-to-end learned token library. These prompt tokens serve as discrete, knowledge-rich representations that encode semantic cues from historical observations in a compact form, making them easier for the decision network to understand and utilize. In addition, we share the token library across various navigation tasks, mining common features between different tasks to improve generalization to unknown environments. Extensive experimental results on four mainstream VLN benchmarks (R2R, REVERIE, SOON, R2R-CE) demonstrate the effectiveness of our proposed method. Code is available at https://github.com/Wzmshdong/HGPG.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.