基于非局部信息和高分辨率特征的自监督深度估计

2022 14th International Conference on Machine Learning and Computing (ICMLC) Pub Date : 2022-02-18 DOI:10.1145/3529836.3529907

Rongying Jing, Yang Liu

{"title":"基于非局部信息和高分辨率特征的自监督深度估计","authors":"Rongying Jing, Yang Liu","doi":"10.1145/3529836.3529907","DOIUrl":null,"url":null,"abstract":"Depth estimation is one of the most challenging tasks in computer vision, especially in self-supervised learning ways without restrictions of high-cost labels. Self-supervised depth estimation aims to infer three-dimensional space structures from two-dimensional planar images, only taking image pairs or sequences as supervision. Most existing methods adopt the encoder-decoder framework with skip-connection and recover the high-resolution depth maps from high-resolution low-level and low-resolution high-level feature maps. However, it is proved that high-resolution high-level feature maps, which are sensitive to illumination, color, texture, etc., are necessary for depth estimation. In this paper, we present a novel approach to extract high-level feature maps at all scales and introduce a self-attention mechanism to consider non-local features. The main improvements of our proposed method are two-fold:1) we combined the high-resolution feature extraction sub-network and extract high-resolution high-level features by connecting the high-to-low resolution convolution streams in parallel; 2) we embed the self-attention module with the features pyramid module(FPA) to obtain general context at large-scale features. The experiments evaluated on the KITTI benchmark have demonstrated that our network outperforms most existing methods and produces more accurate depth maps.","PeriodicalId":285191,"journal":{"name":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-supervised Depth Estimation with High Resolution Features and Non-local Information\",\"authors\":\"Rongying Jing, Yang Liu\",\"doi\":\"10.1145/3529836.3529907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Depth estimation is one of the most challenging tasks in computer vision, especially in self-supervised learning ways without restrictions of high-cost labels. Self-supervised depth estimation aims to infer three-dimensional space structures from two-dimensional planar images, only taking image pairs or sequences as supervision. Most existing methods adopt the encoder-decoder framework with skip-connection and recover the high-resolution depth maps from high-resolution low-level and low-resolution high-level feature maps. However, it is proved that high-resolution high-level feature maps, which are sensitive to illumination, color, texture, etc., are necessary for depth estimation. In this paper, we present a novel approach to extract high-level feature maps at all scales and introduce a self-attention mechanism to consider non-local features. The main improvements of our proposed method are two-fold:1) we combined the high-resolution feature extraction sub-network and extract high-resolution high-level features by connecting the high-to-low resolution convolution streams in parallel; 2) we embed the self-attention module with the features pyramid module(FPA) to obtain general context at large-scale features. The experiments evaluated on the KITTI benchmark have demonstrated that our network outperforms most existing methods and produces more accurate depth maps.\",\"PeriodicalId\":285191,\"journal\":{\"name\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 14th International Conference on Machine Learning and Computing (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3529836.3529907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 14th International Conference on Machine Learning and Computing (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3529836.3529907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度估计是计算机视觉中最具挑战性的任务之一，特别是在没有高成本标签限制的自监督学习方式中。自监督深度估计旨在从二维平面图像中推断三维空间结构，仅以图像对或序列作为监督。现有方法大多采用带跳接的编码器-解码器框架，从高分辨率的低分辨率和低分辨率的高分辨率特征图中恢复高分辨率深度图。然而，事实证明，对光照、颜色、纹理等敏感的高分辨率高级特征图是深度估计所必需的。在本文中，我们提出了一种新的方法来提取所有尺度的高级特征映射，并引入了一种自关注机制来考虑非局部特征。该方法的主要改进有两个方面:1)结合高分辨率特征提取子网络，通过将高分辨率到低分辨率的卷积流并行连接，提取高分辨率的高层特征;2)在自关注模块中嵌入特征金字塔模块(FPA)，以获得大规模特征的一般上下文。在KITTI基准上评估的实验表明，我们的网络优于大多数现有方法，并产生更准确的深度图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Self-supervised Depth Estimation with High Resolution Features and Non-local Information

Depth estimation is one of the most challenging tasks in computer vision, especially in self-supervised learning ways without restrictions of high-cost labels. Self-supervised depth estimation aims to infer three-dimensional space structures from two-dimensional planar images, only taking image pairs or sequences as supervision. Most existing methods adopt the encoder-decoder framework with skip-connection and recover the high-resolution depth maps from high-resolution low-level and low-resolution high-level feature maps. However, it is proved that high-resolution high-level feature maps, which are sensitive to illumination, color, texture, etc., are necessary for depth estimation. In this paper, we present a novel approach to extract high-level feature maps at all scales and introduce a self-attention mechanism to consider non-local features. The main improvements of our proposed method are two-fold:1) we combined the high-resolution feature extraction sub-network and extract high-resolution high-level features by connecting the high-to-low resolution convolution streams in parallel; 2) we embed the self-attention module with the features pyramid module(FPA) to obtain general context at large-scale features. The experiments evaluated on the KITTI benchmark have demonstrated that our network outperforms most existing methods and produces more accurate depth maps.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 14th International Conference on Machine Learning and Computing (ICMLC)

自引率

0.00%

发文量