Geometric Edge Modelling in Self-Supervised Learning for Enhanced Indoor Depth Estimation

IF 1.3 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2025-05-12 DOI:10.1049/cvi2.70026

Niclas Joswig, Laura Ruotsalainen

{"title":"Geometric Edge Modelling in Self-Supervised Learning for Enhanced Indoor Depth Estimation","authors":"Niclas Joswig, Laura Ruotsalainen","doi":"10.1049/cvi2.70026","DOIUrl":null,"url":null,"abstract":"<p>Recently, the accuracy of self-supervised deep learning models for indoor depth estimation has approached that of supervised models by improving the supervision in planar regions. However, a common issue with integrating multiple planar priors is the generation of <i>oversmooth</i> depth maps, leading to unrealistic and erroneous depth representations at edges. Despite the fact that edge pixels only cover a small part of the image, they are of high significance for downstream tasks such as visual odometry, where image features, essential for motion computation, are mostly located at edges. To improve erroneous depth predictions at edge regions, we delve into the self-supervised training process, identifying its limitations and using these insights to develop a geometric edge model. Building on this, we introduce a novel algorithm that utilises the smooth depth predictions of existing models and colour image data to accurately identify edge pixels. After finding the edge pixels, our approach generates targeted self-supervision in these zones by interpolating depth values from adjacent planar areas towards the edges. We integrate the proposed algorithms into a novel loss function that encourages neural networks to predict sharper and more accurate depth edges in indoor scenes. To validate our methodology, we incorporated the proposed edge-enhancing loss function into a state-of-the-art self-supervised depth estimation framework. Our results demonstrate a notable improvement in the accuracy of edge depth predictions and a 19% improvement in visual odometry when using our depth model to generate RGB-D input, compared to the baseline model.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70026","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.70026","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, the accuracy of self-supervised deep learning models for indoor depth estimation has approached that of supervised models by improving the supervision in planar regions. However, a common issue with integrating multiple planar priors is the generation of oversmooth depth maps, leading to unrealistic and erroneous depth representations at edges. Despite the fact that edge pixels only cover a small part of the image, they are of high significance for downstream tasks such as visual odometry, where image features, essential for motion computation, are mostly located at edges. To improve erroneous depth predictions at edge regions, we delve into the self-supervised training process, identifying its limitations and using these insights to develop a geometric edge model. Building on this, we introduce a novel algorithm that utilises the smooth depth predictions of existing models and colour image data to accurately identify edge pixels. After finding the edge pixels, our approach generates targeted self-supervision in these zones by interpolating depth values from adjacent planar areas towards the edges. We integrate the proposed algorithms into a novel loss function that encourages neural networks to predict sharper and more accurate depth edges in indoor scenes. To validate our methodology, we incorporated the proposed edge-enhancing loss function into a state-of-the-art self-supervised depth estimation framework. Our results demonstrate a notable improvement in the accuracy of edge depth predictions and a 19% improvement in visual odometry when using our depth model to generate RGB-D input, compared to the baseline model.

Abstract Image

查看原文本刊更多论文

基于自监督学习的几何边缘建模增强室内深度估计

近年来，通过改进平面区域的监督，自监督深度学习模型的室内深度估计精度已经接近监督模型。然而，整合多个平面先验的一个常见问题是生成过于光滑的深度图，导致边缘的深度表示不现实和错误。尽管边缘像素只覆盖了图像的一小部分，但它们对于视觉里程计等下游任务具有很高的意义，其中运动计算所必需的图像特征大多位于边缘。为了改善边缘区域的错误深度预测，我们深入研究了自监督训练过程，确定了其局限性，并利用这些见解开发了几何边缘模型。在此基础上，我们引入了一种新的算法，该算法利用现有模型和彩色图像数据的平滑深度预测来准确识别边缘像素。在找到边缘像素后，我们的方法通过将相邻平面区域的深度值插值到边缘，在这些区域中产生有针对性的自我监督。我们将提出的算法集成到一个新的损失函数中，该损失函数鼓励神经网络在室内场景中预测更清晰、更准确的深度边缘。为了验证我们的方法，我们将提出的边缘增强损失函数合并到最先进的自监督深度估计框架中。我们的结果表明，与基线模型相比，使用我们的深度模型生成RGB-D输入时，边缘深度预测的准确性有了显着提高，视觉里程计提高了19%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf