DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-09-14 DOI:10.48550/arXiv.2209.06351

Kaichen Zhou, Lanqing Hong, Changhao Chen, Hang Xu, Chao Ye, Qingyong Hu, Zhenguo Li

{"title":"DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction","authors":"Kaichen Zhou, Lanqing Hong, Changhao Chen, Hang Xu, Chao Ye, Qingyong Hu, Zhenguo Li","doi":"10.48550/arXiv.2209.06351","DOIUrl":null,"url":null,"abstract":"Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit the 3D point-wise geometric correspondences, nor effectively tackle the ambiguities in the photometric warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework, that can consider 3D spatial information, and exploit stronger geometric constraints among adjacent camera frustums. Instead of directly regressing the pixel value from a single image, our DevNet divides the camera frustum into multiple parallel planes and predicts the pointwise occlusion probability density on each plane. The final depth map is generated by integrating the density along corresponding rays. During the training process, novel regularization strategies and loss functions are introduced to mitigate photometric ambiguities and overfitting. Without obviously enlarging model parameters size or running time, DevNet outperforms several representative baselines on both the KITTI-2015 outdoor dataset and NYU-V2 indoor dataset. In particular, the root-mean-square-deviation is reduced by around 4% with DevNet on both KITTI-2015 and NYU-V2 in the task of depth estimation. Code is available at https://github.com/gitkaichenzhou/DevNet.","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"29 1","pages":"125-142"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.06351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. However, they neither fully exploit the 3D point-wise geometric correspondences, nor effectively tackle the ambiguities in the photometric warping caused by occlusions or illumination inconsistency. To address these problems, this work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework, that can consider 3D spatial information, and exploit stronger geometric constraints among adjacent camera frustums. Instead of directly regressing the pixel value from a single image, our DevNet divides the camera frustum into multiple parallel planes and predicts the pointwise occlusion probability density on each plane. The final depth map is generated by integrating the density along corresponding rays. During the training process, novel regularization strategies and loss functions are introduced to mitigate photometric ambiguities and overfitting. Without obviously enlarging model parameters size or running time, DevNet outperforms several representative baselines on both the KITTI-2015 outdoor dataset and NYU-V2 indoor dataset. In particular, the root-mean-square-deviation is reduced by around 4% with DevNet on both KITTI-2015 and NYU-V2 in the task of depth estimation. Code is available at https://github.com/gitkaichenzhou/DevNet.

查看原文本刊更多论文

DevNet:通过密度体积构建的自监督单目深度学习

单眼图像的自监督深度学习通常依赖于时间相邻图像帧之间的二维逐像素光度关系。然而，它们既不能充分利用三维逐点几何对应关系，也不能有效地解决由遮挡或光照不一致引起的光度扭曲中的模糊性。为了解决这些问题，本研究提出了密度体积构建网络(DevNet)，这是一种新颖的自监督单目深度学习框架，可以考虑3D空间信息，并利用相邻相机平台之间更强的几何约束。我们的DevNet不是直接从单个图像中回归像素值，而是将相机截锥体划分为多个平行平面，并预测每个平面上的逐点遮挡概率密度。最终的深度图是通过对相应光线的密度积分生成的。在训练过程中，引入了新的正则化策略和损失函数来减轻光度模糊和过拟合。在没有明显扩大模型参数大小或运行时间的情况下，DevNet在KITTI-2015室外数据集和NYU-V2室内数据集上的表现优于几个代表性基线。特别是，在KITTI-2015和NYU-V2的深度估计任务中，使用DevNet将均方根偏差降低了约4%。代码可从https://github.com/gitkaichenzhou/DevNet获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量