Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph

IF 0.8 Q4 OPTICS

Optical Memory and Neural Networks Pub Date : 2025-09-17 DOI:10.3103/S1060992X25600673

S. Linok, G. Naumov

{"title":"Open-Vocabulary Indoor Object Grounding with 3D Hierarchical Scene Graph","authors":"S. Linok, G. Naumov","doi":"10.3103/S1060992X25600673","DOIUrl":null,"url":null,"abstract":"We propose OVIGo-3DHSG method—Open-Vocabulary Indoor Grounding of objects using 3D Hierarchical Scene Graph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.","PeriodicalId":721,"journal":{"name":"Optical Memory and Neural Networks","volume":"34 3","pages":"323 - 333"},"PeriodicalIF":0.8000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Optical Memory and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S1060992X25600673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPTICS","Score":null,"Total":0}

引用次数: 0

Abstract

We propose OVIGo-3DHSG method—Open-Vocabulary Indoor Grounding of objects using 3D Hierarchical Scene Graph. OVIGo-3DHSG represents an extensive indoor environment over a Hierarchical Scene Graph derived from sequences of RGB-D frames utilizing a set of open-vocabulary foundation models and sensor data processing. The hierarchical representation explicitly models spatial relations across floors, rooms, locations, and objects. To effectively address complex queries involving spatial reference to other objects, we integrate the hierarchical scene graph with a Large Language Model for multistep reasoning. This integration leverages inter-layer (e.g., room-to-object) and intra-layer (e.g., object-to-object) connections, enhancing spatial contextual understanding. We investigate the semantic and geometry accuracy of hierarchical representation on Habitat Matterport 3D Semantic multi-floor scenes. Our approach demonstrates efficient scene comprehension and robust object grounding compared to existing methods. Overall OVIGo-3DHSG demonstrates strong potential for applications requiring spatial reasoning and understanding of indoor environments. Related materials can be found at https://github.com/linukc/OVIGo-3DHSG.

Abstract Image

查看原文本刊更多论文

基于三维分层场景图的开放词汇室内物体接地

我们提出了OVIGo-3DHSG方法——基于三维层次场景图的开放词汇室内物体接地。OVIGo-3DHSG利用一组开放词汇基础模型和传感器数据处理，在RGB-D帧序列派生的分层场景图上表示广泛的室内环境。分层表示显式地对跨楼层、房间、位置和对象的空间关系进行建模。为了有效地处理涉及到其他对象的空间引用的复杂查询，我们将分层场景图与用于多步推理的大型语言模型相结合。这种集成利用了层间（例如，房间到对象）和层内（例如，对象到对象）的连接，增强了空间上下文的理解。研究了Habitat Matterport三维语义多层场景的分层表示的语义和几何精度。与现有方法相比，我们的方法展示了高效的场景理解和鲁棒的对象基础。总体而言，OVIGo-3DHSG在需要空间推理和室内环境理解的应用中显示出强大的潜力。相关资料可在https://github.com/linukc/OVIGo-3DHSG找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Optical Memory and Neural Networks OPTICS-

CiteScore

1.50

自引率

11.10%

发文量

期刊介绍： The journal covers a wide range of issues in information optics such as optical memory, mechanisms for optical data recording and processing, photosensitive materials, optical, optoelectronic and holographic nanostructures, and many other related topics. Papers on memory systems using holographic and biological structures and concepts of brain operation are also included. The journal pays particular attention to research in the field of neural net systems that may lead to a new generation of computional technologies by endowing them with intelligence.