自监督单目深度估计的误导性监督去除机制

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-04-01 DOI:10.1016/j.displa.2025.103043

Xinzhou Fan, Jinze Xu, Feng Ye, Yizong Lai

{"title":"自监督单目深度估计的误导性监督去除机制","authors":"Xinzhou Fan, Jinze Xu, Feng Ye, Yizong Lai","doi":"10.1016/j.displa.2025.103043","DOIUrl":null,"url":null,"abstract":"<div><div>Self-supervised monocular depth estimation leverages the photometric consistency assumption and exploits geometric relations between image frames to convert depth errors into reprojection photometric errors. This allows the model train effectively without explicit depth labels. However, due to factors such as the incomplete validity of the photometric consistency assumption, inaccurate geometric relationships between image frames, and sensor noise, there are limitations to photometric error loss, which can easily introduce inaccurate supervision information and mislead the model into local optimal solutions. To address this issue, this paper introduces a Misleading Supervision Removal Mechanism(MSRM), aimed at enhancing the accuracy of supervisory information by eliminating misleading cues. MSRM employs a composite masking strategy that incorporates both pixel-level and image-level masks, where pixel-level masks include sky masks, edge masks, and edge consistency techniques. MSRM largely eliminate misleading supervision information introduced by sky regions, edge regions, and images with low viewpoint changes. Without altering network architecture, MSRM ensures no increase in inference time, making it a plug-and-play solution. Implemented across various self-supervised monocular depth estimation algorithms, experiments on KITTI, Cityscapes, and Make3D datasets demonstrate that MSRM significantly improves the prediction accuracy and generalization performance of the original algorithms.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103043"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Misleading Supervision Removal Mechanism for self-supervised monocular depth estimation\",\"authors\":\"Xinzhou Fan, Jinze Xu, Feng Ye, Yizong Lai\",\"doi\":\"10.1016/j.displa.2025.103043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Self-supervised monocular depth estimation leverages the photometric consistency assumption and exploits geometric relations between image frames to convert depth errors into reprojection photometric errors. This allows the model train effectively without explicit depth labels. However, due to factors such as the incomplete validity of the photometric consistency assumption, inaccurate geometric relationships between image frames, and sensor noise, there are limitations to photometric error loss, which can easily introduce inaccurate supervision information and mislead the model into local optimal solutions. To address this issue, this paper introduces a Misleading Supervision Removal Mechanism(MSRM), aimed at enhancing the accuracy of supervisory information by eliminating misleading cues. MSRM employs a composite masking strategy that incorporates both pixel-level and image-level masks, where pixel-level masks include sky masks, edge masks, and edge consistency techniques. MSRM largely eliminate misleading supervision information introduced by sky regions, edge regions, and images with low viewpoint changes. Without altering network architecture, MSRM ensures no increase in inference time, making it a plug-and-play solution. Implemented across various self-supervised monocular depth estimation algorithms, experiments on KITTI, Cityscapes, and Make3D datasets demonstrate that MSRM significantly improves the prediction accuracy and generalization performance of the original algorithms.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"88 \",\"pages\":\"Article 103043\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225000800\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000800","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

自监督单目深度估计利用光度一致性假设，利用图像帧之间的几何关系将深度误差转化为重投影光度误差。这允许模型在没有显式深度标签的情况下有效地训练。然而，由于光度一致性假设的不完全有效性、图像帧之间的几何关系不准确以及传感器噪声等因素，光度误差损失存在局限性，容易引入不准确的监督信息，使模型陷入局部最优解。为了解决这一问题，本文引入了一种误导性监督去除机制（MSRM），旨在通过消除误导性线索来提高监管信息的准确性。MSRM采用了一种复合蒙版策略，该策略结合了像素级和图像级蒙版，其中像素级蒙版包括天空蒙版、边缘蒙版和边缘一致性技术。MSRM很大程度上消除了天空区域、边缘区域和低视点变化图像引入的误导性监管信息。在不改变网络架构的情况下，MSRM确保不增加推理时间，使其成为即插即用的解决方案。在KITTI、cityscape和Make3D数据集上的实验表明，MSRM显著提高了原始算法的预测精度和泛化性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Misleading Supervision Removal Mechanism for self-supervised monocular depth estimation

Self-supervised monocular depth estimation leverages the photometric consistency assumption and exploits geometric relations between image frames to convert depth errors into reprojection photometric errors. This allows the model train effectively without explicit depth labels. However, due to factors such as the incomplete validity of the photometric consistency assumption, inaccurate geometric relationships between image frames, and sensor noise, there are limitations to photometric error loss, which can easily introduce inaccurate supervision information and mislead the model into local optimal solutions. To address this issue, this paper introduces a Misleading Supervision Removal Mechanism(MSRM), aimed at enhancing the accuracy of supervisory information by eliminating misleading cues. MSRM employs a composite masking strategy that incorporates both pixel-level and image-level masks, where pixel-level masks include sky masks, edge masks, and edge consistency techniques. MSRM largely eliminate misleading supervision information introduced by sky regions, edge regions, and images with low viewpoint changes. Without altering network architecture, MSRM ensures no increase in inference time, making it a plug-and-play solution. Implemented across various self-supervised monocular depth estimation algorithms, experiments on KITTI, Cityscapes, and Make3D datasets demonstrate that MSRM significantly improves the prediction accuracy and generalization performance of the original algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.