Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-14 DOI:10.1109/TIP.2025.3597047

Shenglun Chen;Xinzhu Ma;Hong Zhang;Haojie Li;Zhihui Wang

{"title":"Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion","authors":"Shenglun Chen;Xinzhu Ma;Hong Zhang;Haojie Li;Zhihui Wang","doi":"10.1109/TIP.2025.3597047","DOIUrl":null,"url":null,"abstract":"Depth completion is a pivotal challenge in computer vision, aiming at reconstructing the dense depth map from a sparse one, typically with a paired RGB image. Existing learning-based models rely on carefully prepared but limited data, leading to significant performance degradation in out-of-distribution (OOD) scenarios. Recent foundation models have demonstrated exceptional robustness in monocular depth estimation through large-scale training, and using such models to enhance the robustness of depth completion models is a promising solution. In this work, we propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training. Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions. We further design a dual-space propagation approach, without any learnable parameters, to effectively propagate sparse depth in both 3D and 2D spaces to maintain geometric structure and local consistency. To refine the intricate structure, we introduce a learnable correction module to progressively adjust the depth prediction towards the real depth. We train our model on the NYUv2 and KITTI datasets as in-distribution datasets and extensively evaluate the framework on 16 other datasets. Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods. Our models are released in <uri>https://github.com/shenglunch/PSD</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5285-5299"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11125857/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Depth completion is a pivotal challenge in computer vision, aiming at reconstructing the dense depth map from a sparse one, typically with a paired RGB image. Existing learning-based models rely on carefully prepared but limited data, leading to significant performance degradation in out-of-distribution (OOD) scenarios. Recent foundation models have demonstrated exceptional robustness in monocular depth estimation through large-scale training, and using such models to enhance the robustness of depth completion models is a promising solution. In this work, we propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training. Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions. We further design a dual-space propagation approach, without any learnable parameters, to effectively propagate sparse depth in both 3D and 2D spaces to maintain geometric structure and local consistency. To refine the intricate structure, we introduce a learnable correction module to progressively adjust the depth prediction towards the real depth. We train our model on the NYUv2 and KITTI datasets as in-distribution datasets and extensively evaluate the framework on 16 other datasets. Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods. Our models are released in https://github.com/shenglunch/PSD.

查看原文本刊更多论文

利用深度基础模型传播稀疏深度，实现非分布深度补全。

深度补全是计算机视觉中的一个关键挑战，旨在从稀疏的深度图中重建密集的深度图，通常使用成对的RGB图像。现有的基于学习的模型依赖于精心准备但有限的数据，导致在非分布（OOD）场景下的性能显著下降。最近的基础模型通过大规模训练在单目深度估计中表现出了出色的鲁棒性，使用这些模型来增强深度补全模型的鲁棒性是一个很有前途的解决方案。在这项工作中，我们提出了一种新的深度补全框架，该框架利用深度基础模型在没有大规模训练的情况下获得显著的鲁棒性。具体来说，我们利用深度基础模型从RGB图像中提取环境线索，包括结构和语义上下文，以指导稀疏深度信息在缺失区域的传播。我们进一步设计了一种双空间传播方法，在没有任何可学习参数的情况下，在三维和二维空间中有效地传播稀疏深度，以保持几何结构和局部一致性。为了完善复杂的结构，我们引入了一个可学习的校正模块，逐步将深度预测向真实深度调整。我们在NYUv2和KITTI数据集上作为分布数据集训练我们的模型，并在其他16个数据集上广泛评估框架。我们的框架在OOD场景中表现非常好，优于现有的最先进的深度完井方法。我们的模型在这个链接中发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量