ElevNav: Large Language Model-Guided Robot Navigation via 3D Scene Graphs in Elevator Environments

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks Pub Date : 2025-09-17 DOI:10.3103/S1060992X25700109

Huzhenyu Zhang

{"title":"ElevNav: Large Language Model-Guided Robot Navigation via 3D Scene Graphs in Elevator Environments","authors":"Huzhenyu Zhang","doi":"10.3103/S1060992X25700109","DOIUrl":null,"url":null,"abstract":"<p>Cross-floor robotic navigation has become an increasingly critical capability for autonomous systems operating in multi-floor buildings. While 3D scene graphs have demonstrated promise for representing hierarchical spatial relationships, current approaches predominantly address cross-floor navigation by stairs, overlooking the practical challenges of elevator-mediated navigation in modern buildings. This paper presents <b>ElevNav</b>, a novel framework that bridges this gap through two key innovations: (1) automatic construction of semantically-rich 3D scene graphs from RGB-D sequences with estimated camera trajectories, and (2) task decomposition using large language models to translate natural language commands into executable action sequences. Our method addresses elevator interaction through specialized action primitives such as pressing buttons, entering and exiting the elevator, and moving toward target objects. We evaluate ElevNav in complex simulated environments built using Isaac Sim, demonstrating robust performance in multi-floor navigation scenarios. To facilitate further research, we release a new dataset containing elevator environments with corresponding scene graph representations, addressing a critical gap in existing 3D navigation benchmarks, which is open-sourced at: https://github.com/zhanghuzhenyu/elevnav.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"313 - 322"},"PeriodicalIF":0.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Memory and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S1060992X25700109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-floor robotic navigation has become an increasingly critical capability for autonomous systems operating in multi-floor buildings. While 3D scene graphs have demonstrated promise for representing hierarchical spatial relationships, current approaches predominantly address cross-floor navigation by stairs, overlooking the practical challenges of elevator-mediated navigation in modern buildings. This paper presents ElevNav, a novel framework that bridges this gap through two key innovations: (1) automatic construction of semantically-rich 3D scene graphs from RGB-D sequences with estimated camera trajectories, and (2) task decomposition using large language models to translate natural language commands into executable action sequences. Our method addresses elevator interaction through specialized action primitives such as pressing buttons, entering and exiting the elevator, and moving toward target objects. We evaluate ElevNav in complex simulated environments built using Isaac Sim, demonstrating robust performance in multi-floor navigation scenarios. To facilitate further research, we release a new dataset containing elevator environments with corresponding scene graph representations, addressing a critical gap in existing 3D navigation benchmarks, which is open-sourced at: https://github.com/zhanghuzhenyu/elevnav.

Abstract Image

查看原文本刊更多论文

ElevNav：电梯环境中基于3D场景图的大型语言模型引导机器人导航

跨楼层机器人导航已经成为在多层建筑中运行的自主系统越来越重要的能力。虽然3D场景图已经证明了表示分层空间关系的希望，但目前的方法主要是通过楼梯解决跨层导航，忽略了现代建筑中电梯导航的实际挑战。本文介绍了ElevNav，这是一个通过两个关键创新来弥合这一差距的新框架：(1)根据估计的摄像机轨迹从RGB-D序列自动构建语义丰富的3D场景图，以及(2)使用大型语言模型将自然语言命令转换为可执行的动作序列的任务分解。我们的方法通过特殊的动作原语（如按按钮、进入和退出电梯以及向目标对象移动）来处理电梯交互。我们在使用Isaac Sim构建的复杂模拟环境中评估了ElevNav，展示了在多层导航场景中的稳健性能。为了促进进一步的研究，我们发布了一个新的数据集，其中包含具有相应场景图表示的电梯环境，解决了现有3D导航基准的关键差距，该数据集是开源的：https://github.com/zhanghuzhenyu/elevnav。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Optical Memory and Neural Networks OPTICS-

CiteScore

1.50

自引率

11.10%

发文量

期刊介绍： The journal covers a wide range of issues in information optics such as optical memory, mechanisms for optical data recording and processing, photosensitive materials, optical, optoelectronic and holographic nanostructures, and many other related topics. Papers on memory systems using holographic and biological structures and concepts of brain operation are also included. The journal pays particular attention to research in the field of neural net systems that may lead to a new generation of computional technologies by endowing them with intelligence.