{"title":"SceneFactory: A Workflow-Centric and Unified Framework for Incremental Scene Modeling","authors":"Yijun Yuan;Michael Bleier;Andreas Nüchter","doi":"10.1109/TRO.2025.3562479","DOIUrl":null,"url":null,"abstract":"In this article, we present SceneFactory, a workflow-centric and unified framework for incremental scene modeling that conveniently supports a wide range of applications, such as (unposed and/or uncalibrated) multiview depth estimation, LiDAR completion, (dense) RGB-D/RGB-LiDAR (RGB-L)/Mono/Depth-only reconstruction, and simultaneous localization and mapping (SLAM). The workflow-centric design uses multiple blocks as the basis for constructing different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is placed on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks that form SceneFactory: first, tracking, second, flexion, third, depth estimation, and fourth, scene reconstruction. The tracking block is based on Mono SLAM and is extended to support RGB-D and RGB-L inputs. Flexion is used to convert the depth image (untrackable) into a trackable image. For general-purpose depth estimation, we propose an unposed and uncalibrated multiview depth estimation model (U<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>-MVD) to estimate dense geometry. U<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multiview depth. Relying on U<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>-MVD, SceneFactory both supports user-friendly 3-D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose dual-purpose multiresolutional neural points for the first surface accessible surface color field design, where we introduce improved point rasterization for point cloud-based surface query. We implement and experiment with SceneFactory to demonstrate its broad applicability and high flexibility. Its quality also competes or exceeds the tightly-coupled state of the art approaches in all tasks.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"3183-3201"},"PeriodicalIF":9.4000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10970428","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10970428/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
In this article, we present SceneFactory, a workflow-centric and unified framework for incremental scene modeling that conveniently supports a wide range of applications, such as (unposed and/or uncalibrated) multiview depth estimation, LiDAR completion, (dense) RGB-D/RGB-LiDAR (RGB-L)/Mono/Depth-only reconstruction, and simultaneous localization and mapping (SLAM). The workflow-centric design uses multiple blocks as the basis for constructing different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is placed on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks that form SceneFactory: first, tracking, second, flexion, third, depth estimation, and fourth, scene reconstruction. The tracking block is based on Mono SLAM and is extended to support RGB-D and RGB-L inputs. Flexion is used to convert the depth image (untrackable) into a trackable image. For general-purpose depth estimation, we propose an unposed and uncalibrated multiview depth estimation model (U$^{2}$-MVD) to estimate dense geometry. U$^{2}$-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multiview depth. Relying on U$^{2}$-MVD, SceneFactory both supports user-friendly 3-D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose dual-purpose multiresolutional neural points for the first surface accessible surface color field design, where we introduce improved point rasterization for point cloud-based surface query. We implement and experiment with SceneFactory to demonstrate its broad applicability and high flexibility. Its quality also competes or exceeds the tightly-coupled state of the art approaches in all tasks.
期刊介绍:
The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles.
Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.