LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation

Proceedings of the 26th ACM international conference on Multimedia Pub Date : 2018-10-15 DOI:10.1145/3240508.3240628

Kecheng Zheng, Zhengjun Zha, Yang Cao, X. Chen, Feng Wu

{"title":"LA-Net: Layout-Aware Dense Network for Monocular Depth Estimation","authors":"Kecheng Zheng, Zhengjun Zha, Yang Cao, X. Chen, Feng Wu","doi":"10.1145/3240508.3240628","DOIUrl":null,"url":null,"abstract":"Depth estimation from monocular images is an ill-posed and inherently ambiguous problem. Recently, deep learning technique has been applied for monocular depth estimation seeking data-driven solutions. However, most existing methods focus on pursuing the minimization of average depth regression error at pixel level and neglect to encode the global layout of scene, resulting in layout-inconsistent depth map. This paper proposes a novel Layout-Aware Convolutional Neural Network (LA-Net) for accurate monocular depth estimation by simultaneously perceiving scene layout and local depth details. Specifically, a Spatial Layout Network (SL-Net) is proposed to learn a layout map representing the depth ordering between local patches. A Layout-Aware Depth Estimation Network (LDE-Net) is proposed to estimate pixel-level depth details using multi-scale layout maps as structural guidance, leading to layout-consistent depth map. A dense network module is used as the base network to learn effective visual details resorting to dense feed-forward connections. Moreover, we formulate an order-sensitive softmax loss to well constrain the ill-posed depth inferring problem. Extensive experiments on both indoor scene (NYUD-v2) and outdoor scene (Make3D) datasets have demonstrated that the proposed LA-Net outperforms the state-of-the-art methods and leads to faithful 3D projections.","PeriodicalId":339857,"journal":{"name":"Proceedings of the 26th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3240508.3240628","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Depth estimation from monocular images is an ill-posed and inherently ambiguous problem. Recently, deep learning technique has been applied for monocular depth estimation seeking data-driven solutions. However, most existing methods focus on pursuing the minimization of average depth regression error at pixel level and neglect to encode the global layout of scene, resulting in layout-inconsistent depth map. This paper proposes a novel Layout-Aware Convolutional Neural Network (LA-Net) for accurate monocular depth estimation by simultaneously perceiving scene layout and local depth details. Specifically, a Spatial Layout Network (SL-Net) is proposed to learn a layout map representing the depth ordering between local patches. A Layout-Aware Depth Estimation Network (LDE-Net) is proposed to estimate pixel-level depth details using multi-scale layout maps as structural guidance, leading to layout-consistent depth map. A dense network module is used as the base network to learn effective visual details resorting to dense feed-forward connections. Moreover, we formulate an order-sensitive softmax loss to well constrain the ill-posed depth inferring problem. Extensive experiments on both indoor scene (NYUD-v2) and outdoor scene (Make3D) datasets have demonstrated that the proposed LA-Net outperforms the state-of-the-art methods and leads to faithful 3D projections.

查看原文本刊更多论文

LA-Net:用于单目深度估计的布局感知密集网络

单眼图像的深度估计是一个病态的、固有的模糊问题。近年来，深度学习技术已被应用于单目深度估计，以寻求数据驱动的解决方案。然而，现有的方法大多侧重于追求像素级平均深度回归误差的最小化，忽略了对场景全局布局的编码，导致深度图布局不一致。本文提出了一种新颖的基于布局感知的卷积神经网络(LA-Net)，通过同时感知场景布局和局部深度细节来实现精确的单目深度估计。具体而言，提出了一种空间布局网络(SL-Net)来学习表示局部斑块之间深度排序的布局图。提出了一种基于布局感知的深度估计网络(LDE-Net)，以多比例尺布局图为结构导向估计像素级深度细节，从而得到与布局一致的深度图。使用密集网络模块作为基网络，利用密集前馈连接学习有效的视觉细节。此外，我们还构造了一个阶敏感的softmax损失来很好地约束不适定深度推理问题。在室内场景(NYUD-v2)和室外场景(Make3D)数据集上进行的大量实验表明，所提出的LA-Net优于最先进的方法，并导致忠实的3D投影。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th ACM international conference on Multimedia

自引率

0.00%

发文量