SceneFactory: A Workflow-Centric and Unified Framework for Incremental Scene Modeling

IF 9.4 1区计算机科学 Q1 ROBOTICS

IEEE Transactions on Robotics Pub Date : 2025-04-18 DOI:10.1109/TRO.2025.3562479

Yijun Yuan;Michael Bleier;Andreas Nüchter

{"title":"SceneFactory: A Workflow-Centric and Unified Framework for Incremental Scene Modeling","authors":"Yijun Yuan;Michael Bleier;Andreas Nüchter","doi":"10.1109/TRO.2025.3562479","DOIUrl":null,"url":null,"abstract":"In this article, we present SceneFactory, a workflow-centric and unified framework for incremental scene modeling that conveniently supports a wide range of applications, such as (unposed and/or uncalibrated) multiview depth estimation, LiDAR completion, (dense) RGB-D/RGB-LiDAR (RGB-L)/Mono/Depth-only reconstruction, and simultaneous localization and mapping (SLAM). The workflow-centric design uses multiple blocks as the basis for constructing different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is placed on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks that form SceneFactory: first, tracking, second, flexion, third, depth estimation, and fourth, scene reconstruction. The tracking block is based on Mono SLAM and is extended to support RGB-D and RGB-L inputs. Flexion is used to convert the depth image (untrackable) into a trackable image. For general-purpose depth estimation, we propose an unposed and uncalibrated multiview depth estimation model (U<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>-MVD) to estimate dense geometry. U<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multiview depth. Relying on U<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>-MVD, SceneFactory both supports user-friendly 3-D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose dual-purpose multiresolutional neural points for the first surface accessible surface color field design, where we introduce improved point rasterization for point cloud-based surface query. We implement and experiment with SceneFactory to demonstrate its broad applicability and high flexibility. Its quality also competes or exceeds the tightly-coupled state of the art approaches in all tasks.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"3183-3201"},"PeriodicalIF":9.4000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10970428","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10970428/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

In this article, we present SceneFactory, a workflow-centric and unified framework for incremental scene modeling that conveniently supports a wide range of applications, such as (unposed and/or uncalibrated) multiview depth estimation, LiDAR completion, (dense) RGB-D/RGB-LiDAR (RGB-L)/Mono/Depth-only reconstruction, and simultaneous localization and mapping (SLAM). The workflow-centric design uses multiple blocks as the basis for constructing different production lines. The supported applications, i.e., productions avoid redundancy in their designs. Thus, the focus is placed on each block itself for independent expansion. To support all input combinations, our implementation consists of four building blocks that form SceneFactory: first, tracking, second, flexion, third, depth estimation, and fourth, scene reconstruction. The tracking block is based on Mono SLAM and is extended to support RGB-D and RGB-L inputs. Flexion is used to convert the depth image (untrackable) into a trackable image. For general-purpose depth estimation, we propose an unposed and uncalibrated multiview depth estimation model (U

$^{2}$

-MVD) to estimate dense geometry. U

$^{2}$

-MVD exploits dense bundle adjustment to solve for poses, intrinsics, and inverse depth. A semantic-aware ScaleCov step is then introduced to complete the multiview depth. Relying on U

$^{2}$

-MVD, SceneFactory both supports user-friendly 3-D creation (with just images) and bridges the applications of Dense RGB-D and Dense Mono. For high-quality surface and color reconstruction, we propose dual-purpose multiresolutional neural points for the first surface accessible surface color field design, where we introduce improved point rasterization for point cloud-based surface query. We implement and experiment with SceneFactory to demonstrate its broad applicability and high flexibility. Its quality also competes or exceeds the tightly-coupled state of the art approaches in all tasks.

查看原文本刊更多论文

SceneFactory：以工作流为中心的增量式场景建模统一框架

在本文中，我们介绍了SceneFactory，这是一个以工作流为中心的统一框架，用于增量场景建模，方便地支持广泛的应用，例如（未定位和/或未校准）多视图深度估计，LiDAR补全，（密集）RGB-D/RGB-LiDAR (RGB-L)/单声道/深度重建，以及同步定位和绘图（SLAM）。以工作流为中心的设计使用多个块作为构建不同生产线的基础。支持的应用程序，即产品，在其设计中避免冗余。因此，重点放在每个块本身进行独立扩展。为了支持所有的输入组合，我们的实现由四个组成SceneFactory的构建块组成：第一，跟踪，第二，弯曲，第三，深度估计，第四，场景重建。跟踪块基于单声道SLAM，并扩展到支持RGB-D和RGB-L输入。Flexion用于将深度图像（不可跟踪）转换为可跟踪图像。对于一般用途的深度估计，我们提出了一种未设定和未校准的多视图深度估计模型（U$^{2}$-MVD）来估计密集几何。U$^{2}$-MVD利用密集束调整来求解姿态、本质和逆深度。然后引入语义感知的ScaleCov步骤来完成多视图深度。依靠U$^{2}$-MVD， SceneFactory既支持用户友好的3-D创建（只有图像）和桥梁的应用密集RGB-D和密集单声道。为了实现高质量的表面和颜色重建，我们提出了双用途多分辨率神经点，用于第一个表面可访问的表面颜色场设计，其中我们引入了改进的点光栅化，用于基于点云的表面查询。我们用SceneFactory实现和实验，以证明其广泛的适用性和高度的灵活性。在所有任务中，它的质量也可以与最先进的紧耦合方法相媲美或超过它们。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Robotics 工程技术-机器人学

CiteScore

14.90

自引率

5.10%

发文量

259

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles. Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.