Dense Reconstruction from Monocular Slam with Fusion of Sparse Map-Points and Cnn-Inferred Depth

2018 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2018-07-01 DOI:10.1109/ICME.2018.8486548

Xiang Ji, Xinchen Ye, Hongcan Xu, Haojie Li

引用次数: 3

Abstract

Real-time monocular visual SLAM approaches relying on building sparse correspondences between two or multiple views of the scene, are capable of accurately tracking camera pose and inferring structure of the environment. However, these methods have the common problem, i.e., the reconstructed 3D map is extremely sparse. Recently, convolutional neural network (CNN) is widely used for estimating scene depth from monocular color images. As we observe, sparse map-points generated from epipolar geometry are locally accurate, while CNN-inferred depth map contains high-level global context but generates blurry depth boundaries. Therefore, we propose a depth fusion framework to yield a dense monocular reconstruction that fully exploits the sparse depth samples and the CNN-inferred depth. Color key-frames are employed to guide the depth reconstruction process, avoiding smoothing over depth boundaries. Experimental results on benchmark datasets show the robustness and accuracy of our method.

查看原文本刊更多论文

基于稀疏地图点和cnn推断深度融合的单目Slam密集重建

实时单目SLAM方法依赖于在场景的两个或多个视图之间建立稀疏对应关系，能够准确地跟踪摄像机姿态并推断环境结构。然而，这些方法都有一个共同的问题，即重建的三维地图非常稀疏。近年来，卷积神经网络(CNN)被广泛应用于单目彩色图像的场景深度估计。正如我们所观察到的，由极极几何生成的稀疏地图点在局部是准确的，而cnn推断的深度图包含高层次的全球背景，但产生模糊的深度边界。因此，我们提出了一种深度融合框架，以产生密集的单目重建，充分利用稀疏深度样本和cnn推断深度。使用颜色关键帧来指导深度重建过程，避免了对深度边界的平滑。在基准数据集上的实验结果表明了该方法的鲁棒性和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量