360° High-Resolution Depth Estimation via Uncertainty-Aware Structural Knowledge Transfer

IEEE transactions on artificial intelligence Pub Date : 2024-07-12 DOI:10.1109/TAI.2024.3427068

Zidong Cao;Hao Ai;Athanasios V. Vasilakos;Lin Wang

{"title":"360° High-Resolution Depth Estimation via Uncertainty-Aware Structural Knowledge Transfer","authors":"Zidong Cao;Hao Ai;Athanasios V. Vasilakos;Lin Wang","doi":"10.1109/TAI.2024.3427068","DOIUrl":null,"url":null,"abstract":"To predict high-resolution (HR) omnidirectional depth maps, existing methods typically leverage HR omnidirectional image (ODI) as the input via fully supervised learning. However, in practice, taking HR ODI as input is undesired due to resource-constrained devices. In addition, depth maps are often with lower resolution than color images. Therefore, in this article, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth ground truth (GT) map is available. Our key idea is to transfer the scene structural knowledge from the HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without any extra inference cost. Specifically, we introduce ODI super-resolution (SR) as an auxiliary task and train both tasks collaboratively in a weakly supervised manner to boost the performance of HR depth estimation. The ODI SR task extracts the scene structural knowledge via uncertainty estimation. Buttressed by this, a scene structural knowledge transfer (SSKT) module is proposed with two key components. First, we employ a cylindrical implicit interpolation function (CIIF) to learn cylindrical neural interpolation weights for feature up-sampling and share the parameters of CIIFs between the two tasks. Then, we propose a feature distillation (FD) loss that provides extra structural regularization to help the HR depth estimation task learn more scene structural knowledge. Extensive experiments demonstrate that our weakly supervised method outperforms baseline methods, and even achieves comparable performance with the fully supervised methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5392-5402"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10596550/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

To predict high-resolution (HR) omnidirectional depth maps, existing methods typically leverage HR omnidirectional image (ODI) as the input via fully supervised learning. However, in practice, taking HR ODI as input is undesired due to resource-constrained devices. In addition, depth maps are often with lower resolution than color images. Therefore, in this article, we explore for the first time to estimate the HR omnidirectional depth directly from a low-resolution (LR) ODI, when no HR depth ground truth (GT) map is available. Our key idea is to transfer the scene structural knowledge from the HR image modality and the corresponding LR depth maps to achieve the goal of HR depth estimation without any extra inference cost. Specifically, we introduce ODI super-resolution (SR) as an auxiliary task and train both tasks collaboratively in a weakly supervised manner to boost the performance of HR depth estimation. The ODI SR task extracts the scene structural knowledge via uncertainty estimation. Buttressed by this, a scene structural knowledge transfer (SSKT) module is proposed with two key components. First, we employ a cylindrical implicit interpolation function (CIIF) to learn cylindrical neural interpolation weights for feature up-sampling and share the parameters of CIIFs between the two tasks. Then, we propose a feature distillation (FD) loss that provides extra structural regularization to help the HR depth estimation task learn more scene structural knowledge. Extensive experiments demonstrate that our weakly supervised method outperforms baseline methods, and even achieves comparable performance with the fully supervised methods.

查看原文本刊更多论文

通过不确定性感知结构知识转移实现 360° 高分辨率深度估算

为了预测高分辨率（HR）全向深度图，现有方法通常通过完全监督学习，利用 HR 全向图像（ODI）作为输入。然而，在实际应用中，由于设备资源有限，将高分辨率全向图像作为输入是不可取的。此外，深度图的分辨率通常低于彩色图像。因此，在本文中，我们首次探索了在没有高清深度地面实况（GT）图的情况下，直接从低分辨率（LR）ODI 估算高清全向深度的方法。我们的主要想法是从高分辨率图像模式和相应的低分辨率深度图中转移场景结构知识，以实现高分辨率深度估计的目标，而无需任何额外的推理成本。具体来说，我们引入 ODI 超分辨率（SR）作为辅助任务，并以弱监督的方式对两个任务进行协同训练，以提高 HR 深度估计的性能。ODI SR 任务通过不确定性估计提取场景结构知识。在此基础上，我们提出了一个场景结构知识转移（SSKT）模块，该模块由两个关键部分组成。首先，我们采用圆柱隐式插值函数（CIIF）来学习用于特征上采样的圆柱神经插值权重，并在两个任务之间共享 CIIF 的参数。然后，我们提出了一种提供额外结构正则化的特征蒸馏（FD）损失，以帮助 HR 深度估计任务学习更多场景结构知识。大量实验证明，我们的弱监督方法优于基线方法，甚至达到了与完全监督方法相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量