Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion

IF 13.7
Shenglun Chen;Xinzhu Ma;Hong Zhang;Haojie Li;Zhihui Wang
{"title":"Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion","authors":"Shenglun Chen;Xinzhu Ma;Hong Zhang;Haojie Li;Zhihui Wang","doi":"10.1109/TIP.2025.3597047","DOIUrl":null,"url":null,"abstract":"Depth completion is a pivotal challenge in computer vision, aiming at reconstructing the dense depth map from a sparse one, typically with a paired RGB image. Existing learning-based models rely on carefully prepared but limited data, leading to significant performance degradation in out-of-distribution (OOD) scenarios. Recent foundation models have demonstrated exceptional robustness in monocular depth estimation through large-scale training, and using such models to enhance the robustness of depth completion models is a promising solution. In this work, we propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training. Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions. We further design a dual-space propagation approach, without any learnable parameters, to effectively propagate sparse depth in both 3D and 2D spaces to maintain geometric structure and local consistency. To refine the intricate structure, we introduce a learnable correction module to progressively adjust the depth prediction towards the real depth. We train our model on the NYUv2 and KITTI datasets as in-distribution datasets and extensively evaluate the framework on 16 other datasets. Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods. Our models are released in <uri>https://github.com/shenglunch/PSD</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5285-5299"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11125857/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Depth completion is a pivotal challenge in computer vision, aiming at reconstructing the dense depth map from a sparse one, typically with a paired RGB image. Existing learning-based models rely on carefully prepared but limited data, leading to significant performance degradation in out-of-distribution (OOD) scenarios. Recent foundation models have demonstrated exceptional robustness in monocular depth estimation through large-scale training, and using such models to enhance the robustness of depth completion models is a promising solution. In this work, we propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training. Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions. We further design a dual-space propagation approach, without any learnable parameters, to effectively propagate sparse depth in both 3D and 2D spaces to maintain geometric structure and local consistency. To refine the intricate structure, we introduce a learnable correction module to progressively adjust the depth prediction towards the real depth. We train our model on the NYUv2 and KITTI datasets as in-distribution datasets and extensively evaluate the framework on 16 other datasets. Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods. Our models are released in https://github.com/shenglunch/PSD.
利用深度基础模型传播稀疏深度,实现非分布深度补全。
深度补全是计算机视觉中的一个关键挑战,旨在从稀疏的深度图中重建密集的深度图,通常使用成对的RGB图像。现有的基于学习的模型依赖于精心准备但有限的数据,导致在非分布(OOD)场景下的性能显著下降。最近的基础模型通过大规模训练在单目深度估计中表现出了出色的鲁棒性,使用这些模型来增强深度补全模型的鲁棒性是一个很有前途的解决方案。在这项工作中,我们提出了一种新的深度补全框架,该框架利用深度基础模型在没有大规模训练的情况下获得显著的鲁棒性。具体来说,我们利用深度基础模型从RGB图像中提取环境线索,包括结构和语义上下文,以指导稀疏深度信息在缺失区域的传播。我们进一步设计了一种双空间传播方法,在没有任何可学习参数的情况下,在三维和二维空间中有效地传播稀疏深度,以保持几何结构和局部一致性。为了完善复杂的结构,我们引入了一个可学习的校正模块,逐步将深度预测向真实深度调整。我们在NYUv2和KITTI数据集上作为分布数据集训练我们的模型,并在其他16个数据集上广泛评估框架。我们的框架在OOD场景中表现非常好,优于现有的最先进的深度完井方法。我们的模型在这个链接中发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信