{"title":"Multimodal 3D Map Reconstruction for Intelligent Robotcs Using Neural Network-Based Methods","authors":"D. A. Yudin","doi":"10.1134/S1064562424602014","DOIUrl":null,"url":null,"abstract":"<p>Methods for constructing multimodal 3D maps are becoming increasingly important for robot navigation systems. In such maps, each 3D point or object contains, in addition to color and semantic category information, compressed vector representations of a text description or sound. This allows solving problems of moving to objects based on natural language queries, even those that do not explicitly mention the object. This article proposes an original taxonomy of methods that allow constructing multimodal 3D maps using neural network methods. It is shown that sparse methods that use a scene representation in the form of an object graph and large language models to find an answer to spatial and semantic queries demonstrate the most promising results on existing open benchmarks. Based on the analysis, recommendations are formulated for choosing certain methods for solving practical problems of intelligent robotics.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S117 - S125"},"PeriodicalIF":0.5000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602014.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Doklady Mathematics","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1134/S1064562424602014","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Methods for constructing multimodal 3D maps are becoming increasingly important for robot navigation systems. In such maps, each 3D point or object contains, in addition to color and semantic category information, compressed vector representations of a text description or sound. This allows solving problems of moving to objects based on natural language queries, even those that do not explicitly mention the object. This article proposes an original taxonomy of methods that allow constructing multimodal 3D maps using neural network methods. It is shown that sparse methods that use a scene representation in the form of an object graph and large language models to find an answer to spatial and semantic queries demonstrate the most promising results on existing open benchmarks. Based on the analysis, recommendations are formulated for choosing certain methods for solving practical problems of intelligent robotics.
期刊介绍:
Doklady Mathematics is a journal of the Presidium of the Russian Academy of Sciences. It contains English translations of papers published in Doklady Akademii Nauk (Proceedings of the Russian Academy of Sciences), which was founded in 1933 and is published 36 times a year. Doklady Mathematics includes the materials from the following areas: mathematics, mathematical physics, computer science, control theory, and computers. It publishes brief scientific reports on previously unpublished significant new research in mathematics and its applications. The main contributors to the journal are Members of the RAS, Corresponding Members of the RAS, and scientists from the former Soviet Union and other foreign countries. Among the contributors are the outstanding Russian mathematicians.