{"title":"Misleading Supervision Removal Mechanism for self-supervised monocular depth estimation","authors":"Xinzhou Fan, Jinze Xu, Feng Ye, Yizong Lai","doi":"10.1016/j.displa.2025.103043","DOIUrl":null,"url":null,"abstract":"<div><div>Self-supervised monocular depth estimation leverages the photometric consistency assumption and exploits geometric relations between image frames to convert depth errors into reprojection photometric errors. This allows the model train effectively without explicit depth labels. However, due to factors such as the incomplete validity of the photometric consistency assumption, inaccurate geometric relationships between image frames, and sensor noise, there are limitations to photometric error loss, which can easily introduce inaccurate supervision information and mislead the model into local optimal solutions. To address this issue, this paper introduces a Misleading Supervision Removal Mechanism(MSRM), aimed at enhancing the accuracy of supervisory information by eliminating misleading cues. MSRM employs a composite masking strategy that incorporates both pixel-level and image-level masks, where pixel-level masks include sky masks, edge masks, and edge consistency techniques. MSRM largely eliminate misleading supervision information introduced by sky regions, edge regions, and images with low viewpoint changes. Without altering network architecture, MSRM ensures no increase in inference time, making it a plug-and-play solution. Implemented across various self-supervised monocular depth estimation algorithms, experiments on KITTI, Cityscapes, and Make3D datasets demonstrate that MSRM significantly improves the prediction accuracy and generalization performance of the original algorithms.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103043"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000800","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Self-supervised monocular depth estimation leverages the photometric consistency assumption and exploits geometric relations between image frames to convert depth errors into reprojection photometric errors. This allows the model train effectively without explicit depth labels. However, due to factors such as the incomplete validity of the photometric consistency assumption, inaccurate geometric relationships between image frames, and sensor noise, there are limitations to photometric error loss, which can easily introduce inaccurate supervision information and mislead the model into local optimal solutions. To address this issue, this paper introduces a Misleading Supervision Removal Mechanism(MSRM), aimed at enhancing the accuracy of supervisory information by eliminating misleading cues. MSRM employs a composite masking strategy that incorporates both pixel-level and image-level masks, where pixel-level masks include sky masks, edge masks, and edge consistency techniques. MSRM largely eliminate misleading supervision information introduced by sky regions, edge regions, and images with low viewpoint changes. Without altering network architecture, MSRM ensures no increase in inference time, making it a plug-and-play solution. Implemented across various self-supervised monocular depth estimation algorithms, experiments on KITTI, Cityscapes, and Make3D datasets demonstrate that MSRM significantly improves the prediction accuracy and generalization performance of the original algorithms.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.