Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-09-26 DOI:10.1109/LRA.2025.3615028

Masaki Kuribayashi;Kohei Uehara;Allan Wang;Daisuke Sato;Renato Alexandre Ribeiro;Simon Chu;Shigeo Morishima

{"title":"Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People","authors":"Masaki Kuribayashi;Kohei Uehara;Allan Wang;Daisuke Sato;Renato Alexandre Ribeiro;Simon Chu;Shigeo Morishima","doi":"10.1109/LRA.2025.3615028","DOIUrl":null,"url":null,"abstract":"Visual Language Navigation (VLN) powered robots have the potential to guide blind people by understanding route instructions provided by sighted passersby. This capability allows robots to operate in environments often unknown a prior. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contains stutters, errors, and omissions of details, as opposed to those obtained by thinking out loud, such as in the R2R dataset. However, existing benchmarks do not contain instructions obtained from human memory in natural environments. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. Our analysis demonstrates that instruction data collected from memory was longer and contained more varied wording. We further demonstrate that addressing errors and ambiguities from memory-based instructions is challenging, by evaluating state-of-the-art models alongside our baseline model with modularized perception and controls.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11658-11665"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11181068/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Visual Language Navigation (VLN) powered robots have the potential to guide blind people by understanding route instructions provided by sighted passersby. This capability allows robots to operate in environments often unknown a prior. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contains stutters, errors, and omissions of details, as opposed to those obtained by thinking out loud, such as in the R2R dataset. However, existing benchmarks do not contain instructions obtained from human memory in natural environments. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. Our analysis demonstrates that instruction data collected from memory was longer and contained more varied wording. We further demonstrate that addressing errors and ambiguities from memory-based instructions is challenging, by evaluating state-of-the-art models alongside our baseline model with modularized perception and controls.

查看原文本刊更多论文

记忆迷宫：场景驱动的视觉语言导航基准

视觉语言导航（VLN）驱动的机器人有可能通过理解视力正常的路人提供的路线指示来引导盲人。这种能力使机器人能够在通常未知的环境中工作。现有的VLN模型对于盲人导航的场景是不够的，因为他们需要理解从人类记忆中描述的路线，这些路线经常包含口吃、错误和遗漏细节，而不是像R2R数据集那样通过大声思考获得的路线。然而，现有的基准测试不包含在自然环境中从人类记忆中获得的指令。为此，我们提出了我们的基准，记忆迷宫，模拟寻找路线指令的场景，为盲人引路。我们的基准测试包含一个类似迷宫的结构化虚拟环境和来自人类记忆的新颖路线指令数据。我们的分析表明，从记忆中收集的指令数据更长，包含更多不同的措辞。我们进一步证明，通过评估最先进的模型以及具有模块化感知和控制的基线模型，解决基于内存的指令的错误和歧义是具有挑战性的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.