Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People

IF 5.3 2区 计算机科学 Q2 ROBOTICS
Masaki Kuribayashi;Kohei Uehara;Allan Wang;Daisuke Sato;Renato Alexandre Ribeiro;Simon Chu;Shigeo Morishima
{"title":"Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People","authors":"Masaki Kuribayashi;Kohei Uehara;Allan Wang;Daisuke Sato;Renato Alexandre Ribeiro;Simon Chu;Shigeo Morishima","doi":"10.1109/LRA.2025.3615028","DOIUrl":null,"url":null,"abstract":"Visual Language Navigation (VLN) powered robots have the potential to guide blind people by understanding route instructions provided by sighted passersby. This capability allows robots to operate in environments often unknown a prior. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contains stutters, errors, and omissions of details, as opposed to those obtained by thinking out loud, such as in the R2R dataset. However, existing benchmarks do not contain instructions obtained from human memory in natural environments. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. Our analysis demonstrates that instruction data collected from memory was longer and contained more varied wording. We further demonstrate that addressing errors and ambiguities from memory-based instructions is challenging, by evaluating state-of-the-art models alongside our baseline model with modularized perception and controls.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 11","pages":"11658-11665"},"PeriodicalIF":5.3000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11181068/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Visual Language Navigation (VLN) powered robots have the potential to guide blind people by understanding route instructions provided by sighted passersby. This capability allows robots to operate in environments often unknown a prior. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contains stutters, errors, and omissions of details, as opposed to those obtained by thinking out loud, such as in the R2R dataset. However, existing benchmarks do not contain instructions obtained from human memory in natural environments. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. Our analysis demonstrates that instruction data collected from memory was longer and contained more varied wording. We further demonstrate that addressing errors and ambiguities from memory-based instructions is challenging, by evaluating state-of-the-art models alongside our baseline model with modularized perception and controls.
记忆迷宫:场景驱动的视觉语言导航基准
视觉语言导航(VLN)驱动的机器人有可能通过理解视力正常的路人提供的路线指示来引导盲人。这种能力使机器人能够在通常未知的环境中工作。现有的VLN模型对于盲人导航的场景是不够的,因为他们需要理解从人类记忆中描述的路线,这些路线经常包含口吃、错误和遗漏细节,而不是像R2R数据集那样通过大声思考获得的路线。然而,现有的基准测试不包含在自然环境中从人类记忆中获得的指令。为此,我们提出了我们的基准,记忆迷宫,模拟寻找路线指令的场景,为盲人引路。我们的基准测试包含一个类似迷宫的结构化虚拟环境和来自人类记忆的新颖路线指令数据。我们的分析表明,从记忆中收集的指令数据更长,包含更多不同的措辞。我们进一步证明,通过评估最先进的模型以及具有模块化感知和控制的基线模型,解决基于内存的指令的错误和歧义是具有挑战性的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信