Runmin Cong;Chunlei Wu;Xibin Song;Wei Zhang;Sam Kwong;Hongdong Li;Pan Ji
{"title":"SRNSD: Structure-Regularized Night-Time Self-Supervised Monocular Depth Estimation for Outdoor Scenes","authors":"Runmin Cong;Chunlei Wu;Xibin Song;Wei Zhang;Sam Kwong;Hongdong Li;Pan Ji","doi":"10.1109/TIP.2024.3465034","DOIUrl":null,"url":null,"abstract":"Deep CNNs have achieved impressive improvements for night-time self-supervised depth estimation form a monocular image. However, the performance degrades considerably compared to day-time depth estimation due to significant domain gaps, low visibility, and varying illuminations between day and night images. To address these challenges, we propose a novel night-time self-supervised monocular depth estimation framework with structure regularization, i.e., SRNSD, which incorporates three aspects of constraints for better performance, including feature and depth domain adaptation, image perspective constraint, and cropped multi-scale consistency loss. Specifically, we utilize adaptations of both feature and depth output spaces for better night-time feature extraction and depth map prediction, along with high- and low-frequency decoupling operations for better depth structure and texture recovery. Meanwhile, we employ an image perspective constraint to enhance the smoothness and obtain better depth maps in areas where the luminosity jumps change. Furthermore, we introduce a simple yet effective cropped multi-scale consistency loss that utilizes consistency among different scales of depth outputs for further optimization, refining the detailed textures and structures of predicted depth. Experimental results on different benchmarks with depth ranges of 40m and 60m, including Oxford RobotCar dataset, nuScenes dataset and CARLA-EPE dataset, demonstrate the superiority of our approach over state-of-the-art night-time self-supervised depth estimation approaches across multiple metrics, proving our effectiveness.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5538-5550"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10696933/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep CNNs have achieved impressive improvements for night-time self-supervised depth estimation form a monocular image. However, the performance degrades considerably compared to day-time depth estimation due to significant domain gaps, low visibility, and varying illuminations between day and night images. To address these challenges, we propose a novel night-time self-supervised monocular depth estimation framework with structure regularization, i.e., SRNSD, which incorporates three aspects of constraints for better performance, including feature and depth domain adaptation, image perspective constraint, and cropped multi-scale consistency loss. Specifically, we utilize adaptations of both feature and depth output spaces for better night-time feature extraction and depth map prediction, along with high- and low-frequency decoupling operations for better depth structure and texture recovery. Meanwhile, we employ an image perspective constraint to enhance the smoothness and obtain better depth maps in areas where the luminosity jumps change. Furthermore, we introduce a simple yet effective cropped multi-scale consistency loss that utilizes consistency among different scales of depth outputs for further optimization, refining the detailed textures and structures of predicted depth. Experimental results on different benchmarks with depth ranges of 40m and 60m, including Oxford RobotCar dataset, nuScenes dataset and CARLA-EPE dataset, demonstrate the superiority of our approach over state-of-the-art night-time self-supervised depth estimation approaches across multiple metrics, proving our effectiveness.