Fast and robust learned single-view depth-aided monocular visual-inertial initialization

The International Journal of Robotics Research Pub Date : 2024-07-25 DOI:10.1177/02783649241262452

Nathaniel Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, Guoquan Huang

{"title":"Fast and robust learned single-view depth-aided monocular visual-inertial initialization","authors":"Nathaniel Merrill, Patrick Geneva, Saimouli Katragadda, Chuchu Chen, Guoquan Huang","doi":"10.1177/02783649241262452","DOIUrl":null,"url":null,"abstract":"In monocular visual-inertial navigation, it is desirable to initialize the system as quickly and robustly as possible. A state-of-the-art initialization method typically constructs a linear system to find a closed-form solution using the image features and inertial measurements and then refines the states with a nonlinear optimization. These methods generally require a few seconds of data, which however can be expedited (less than a second) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further accelerate this process, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. Importantly, we show that the typical estimation of all feature states independently in the closed-form solution can be modeled as estimating only the scale and bias parameters of the learned depth map. As such, our formulation enables building a smaller minimal problem than the state of the art, which can be seamlessly integrated into RANSAC for robust estimation. Experiments show that our method has state-of-the-art initialization performance in simulation as well as on popular real-world datasets (TUM-VI, and EuRoC MAV). For the TUM-VI dataset in simulation as well as real-world, we demonstrate the superior initialization performance with only a 0.3 s window of data, which is the smallest ever reported, and validate that our method can initialize more often, robustly, and accurately in different challenging scenarios.","PeriodicalId":501362,"journal":{"name":"The International Journal of Robotics Research","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Journal of Robotics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/02783649241262452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In monocular visual-inertial navigation, it is desirable to initialize the system as quickly and robustly as possible. A state-of-the-art initialization method typically constructs a linear system to find a closed-form solution using the image features and inertial measurements and then refines the states with a nonlinear optimization. These methods generally require a few seconds of data, which however can be expedited (less than a second) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further accelerate this process, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. Importantly, we show that the typical estimation of all feature states independently in the closed-form solution can be modeled as estimating only the scale and bias parameters of the learned depth map. As such, our formulation enables building a smaller minimal problem than the state of the art, which can be seamlessly integrated into RANSAC for robust estimation. Experiments show that our method has state-of-the-art initialization performance in simulation as well as on popular real-world datasets (TUM-VI, and EuRoC MAV). For the TUM-VI dataset in simulation as well as real-world, we demonstrate the superior initialization performance with only a 0.3 s window of data, which is the smallest ever reported, and validate that our method can initialize more often, robustly, and accurately in different challenging scenarios.

查看原文本刊更多论文

快速稳健的学习型单视角深度辅助单目视觉惯性初始化

在单目视觉惯性导航中，最好能尽可能快速、稳健地对系统进行初始化。最先进的初始化方法通常是构建一个线性系统，利用图像特征和惯性测量结果找到闭式解，然后通过非线性优化来完善状态。这些方法通常需要几秒钟的数据，但通过在非线性优化过程中添加来自稳健但仅达到一定规模的单目深度网络的约束条件，可以加快这一过程（不到一秒）。为了进一步加快这一过程，在这项工作中，我们在非线性初始化之前的线性初始化步骤中利用了无标度深度测量，该步骤只需要第一帧的单个深度图像。重要的是，我们证明了在闭式求解中对所有特征状态进行独立估计的典型方法，可以建模为只对学习深度图的比例和偏差参数进行估计。因此，与现有技术相比，我们的方法能够构建一个更小的最小问题，并可无缝集成到 RANSAC 中进行稳健估计。实验表明，我们的方法在模拟以及流行的真实世界数据集（TUM-VI 和 EuRoC MAV）上都具有最先进的初始化性能。对于 TUM-VI 模拟和实际数据集，我们仅用 0.3 秒的数据窗口就展示了卓越的初始化性能，这是迄今为止报道过的最小窗口，并验证了我们的方法可以在不同的挑战性场景中更频繁、稳健、准确地进行初始化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The International Journal of Robotics Research

自引率

0.00%

发文量