{"title":"ElevNav: Large Language Model-Guided Robot Navigation via 3D Scene Graphs in Elevator Environments","authors":"Huzhenyu Zhang","doi":"10.3103/S1060992X25700109","DOIUrl":null,"url":null,"abstract":"<p>Cross-floor robotic navigation has become an increasingly critical capability for autonomous systems operating in multi-floor buildings. While 3D scene graphs have demonstrated promise for representing hierarchical spatial relationships, current approaches predominantly address cross-floor navigation by stairs, overlooking the practical challenges of elevator-mediated navigation in modern buildings. This paper presents <b>ElevNav</b>, a novel framework that bridges this gap through two key innovations: (1) automatic construction of semantically-rich 3D scene graphs from RGB-D sequences with estimated camera trajectories, and (2) task decomposition using large language models to translate natural language commands into executable action sequences. Our method addresses elevator interaction through specialized action primitives such as pressing buttons, entering and exiting the elevator, and moving toward target objects. We evaluate ElevNav in complex simulated environments built using Isaac Sim, demonstrating robust performance in multi-floor navigation scenarios. To facilitate further research, we release a new dataset containing elevator environments with corresponding scene graph representations, addressing a critical gap in existing 3D navigation benchmarks, which is open-sourced at: https://github.com/zhanghuzhenyu/elevnav.</p>","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"313 - 322"},"PeriodicalIF":0.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Memory and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S1060992X25700109","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Cross-floor robotic navigation has become an increasingly critical capability for autonomous systems operating in multi-floor buildings. While 3D scene graphs have demonstrated promise for representing hierarchical spatial relationships, current approaches predominantly address cross-floor navigation by stairs, overlooking the practical challenges of elevator-mediated navigation in modern buildings. This paper presents ElevNav, a novel framework that bridges this gap through two key innovations: (1) automatic construction of semantically-rich 3D scene graphs from RGB-D sequences with estimated camera trajectories, and (2) task decomposition using large language models to translate natural language commands into executable action sequences. Our method addresses elevator interaction through specialized action primitives such as pressing buttons, entering and exiting the elevator, and moving toward target objects. We evaluate ElevNav in complex simulated environments built using Isaac Sim, demonstrating robust performance in multi-floor navigation scenarios. To facilitate further research, we release a new dataset containing elevator environments with corresponding scene graph representations, addressing a critical gap in existing 3D navigation benchmarks, which is open-sourced at: https://github.com/zhanghuzhenyu/elevnav.
期刊介绍:
The journal covers a wide range of issues in information optics such as optical memory, mechanisms for optical data recording and processing, photosensitive materials, optical, optoelectronic and holographic nanostructures, and many other related topics. Papers on memory systems using holographic and biological structures and concepts of brain operation are also included. The journal pays particular attention to research in the field of neural net systems that may lead to a new generation of computional technologies by endowing them with intelligence.