Jose Andres Millan-Romera, Hriday Bavle, Muhammad Shaheer, Holger Voos, Jose Luis Sanchez-Lopez
{"title":"Metric-Semantic Factor Graph Generation based on Graph Neural Networks","authors":"Jose Andres Millan-Romera, Hriday Bavle, Muhammad Shaheer, Holger Voos, Jose Luis Sanchez-Lopez","doi":"arxiv-2409.11972","DOIUrl":null,"url":null,"abstract":"Understanding the relationships between geometric structures and semantic\nconcepts is crucial for building accurate models of complex environments. In\nindoors, certain spatial constraints, such as the relative positioning of\nplanes, remain consistent despite variations in layout. This paper explores how\nthese invariant relationships can be captured in a graph SLAM framework by\nrepresenting high-level concepts like rooms and walls, linking them to\ngeometric elements like planes through an optimizable factor graph. Several\nefforts have tackled this issue with add-hoc solutions for each concept\ngeneration and with manually-defined factors. This paper proposes a novel method for metric-semantic factor graph\ngeneration which includes defining a semantic scene graph, integrating\ngeometric information, and learning the interconnecting factors, all based on\nGraph Neural Networks (GNNs). An edge classification network (G-GNN) sorts the\nedges between planes into same room, same wall or none types. The resulting\nrelations are clustered, generating a room or wall for each cluster. A second\nfamily of networks (F-GNN) infers the geometrical origin of the new nodes. The\ndefinition of the factors employs the same F-GNN used for the metric attribute\nof the generated nodes. Furthermore, share the new factor graph with the\nS-Graphs+ algorithm, extending its graph expressiveness and scene\nrepresentation with the ultimate goal of improving the SLAM performance. The\ncomplexity of the environments is increased to N-plane rooms by training the\nnetworks on L-shaped rooms. The framework is evaluated in synthetic and\nsimulated scenarios as no real datasets of the required complex layouts are\navailable.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Understanding the relationships between geometric structures and semantic
concepts is crucial for building accurate models of complex environments. In
indoors, certain spatial constraints, such as the relative positioning of
planes, remain consistent despite variations in layout. This paper explores how
these invariant relationships can be captured in a graph SLAM framework by
representing high-level concepts like rooms and walls, linking them to
geometric elements like planes through an optimizable factor graph. Several
efforts have tackled this issue with add-hoc solutions for each concept
generation and with manually-defined factors. This paper proposes a novel method for metric-semantic factor graph
generation which includes defining a semantic scene graph, integrating
geometric information, and learning the interconnecting factors, all based on
Graph Neural Networks (GNNs). An edge classification network (G-GNN) sorts the
edges between planes into same room, same wall or none types. The resulting
relations are clustered, generating a room or wall for each cluster. A second
family of networks (F-GNN) infers the geometrical origin of the new nodes. The
definition of the factors employs the same F-GNN used for the metric attribute
of the generated nodes. Furthermore, share the new factor graph with the
S-Graphs+ algorithm, extending its graph expressiveness and scene
representation with the ultimate goal of improving the SLAM performance. The
complexity of the environments is increased to N-plane rooms by training the
networks on L-shaped rooms. The framework is evaluated in synthetic and
simulated scenarios as no real datasets of the required complex layouts are
available.
理解几何结构和语义概念之间的关系对于建立复杂环境的精确模型至关重要。在室内,尽管布局各不相同,但某些空间约束条件(如飞机的相对位置)仍然保持一致。本文探讨了如何在图 SLAM 框架中捕捉这些不变的关系,方法是表示房间和墙壁等高级概念,并通过可优化的因子图将它们与平面等几何元素联系起来。在解决这一问题的过程中,许多人都采用了针对每种概念生成和手动定义因子的临时解决方案。本文提出了一种新的度量-语义因子图生成方法,包括定义语义场景图、整合几何信息和学习相互连接的因子,所有这些都基于图神经网络(GNN)。边缘分类网络(G-GNN)将平面之间的边缘分为同一房间、同一墙壁或无类型。对由此产生的关系进行聚类,为每个聚类生成一个房间或一面墙。第二类网络(F-GNN)推断新节点的几何起源。因子的定义与 F-GNN 相同,用于生成节点的度量属性。此外,将新的因子图与 S-Graphs+ 算法共享,扩展了其图形表达能力和场景表示能力,最终目的是提高 SLAM 性能。通过在 L 型房间中训练网络,将环境复杂度提高到 N 平面房间。由于没有所需的复杂布局的真实数据集,因此在合成和模拟场景中对该框架进行了评估。