{"title":"WS-SfMLearner: self-supervised monocular depth and ego-motion estimation on surgical videos with unknown camera parameters.","authors":"Ange Lou, Jack Noble","doi":"10.1117/1.JMI.12.2.025003","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Accurate depth estimation in surgical videos is a pivotal component of numerous image-guided surgery procedures. However, creating ground truth depth maps for surgical videos is often infeasible due to challenges such as inconsistent illumination and sensor noise. As a result, self-supervised depth and ego-motion estimation frameworks are gaining traction, eliminating the need for manually annotated depth maps. Despite the progress, current self-supervised methods still rely on known camera intrinsic parameters, which are frequently unavailable or unrecorded in surgical environments. We address this gap by introducing a self-supervised system capable of jointly predicting depth maps, camera poses, and intrinsic parameters, providing a comprehensive solution for depth estimation under such constraints.</p><p><strong>Approach: </strong>We developed a self-supervised depth and ego-motion estimation framework, incorporating a cost volume-based auxiliary supervision module. This module provides additional supervision for predicting camera intrinsic parameters, allowing for robust estimation even without predefined intrinsics. The system was rigorously evaluated on a public dataset to assess its effectiveness in simultaneously predicting depth, camera pose, and intrinsic parameters.</p><p><strong>Results: </strong>The experimental results demonstrated that the proposed method significantly improved the accuracy of ego-motion and depth prediction, even when compared with methods incorporating known camera intrinsics. In addition, by integrating our cost volume-based supervision, the accuracy of camera parameter estimation, including intrinsic parameters, was further enhanced.</p><p><strong>Conclusions: </strong>We present a self-supervised system for depth, ego-motion, and intrinsic parameter estimation, effectively overcoming the limitations imposed by unknown or missing camera intrinsics. The experimental results confirm that the proposed method outperforms the baseline techniques, offering a robust solution for depth estimation in complex surgical video scenarios, with broader implications for improving image-guided surgery systems.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 2","pages":"025003"},"PeriodicalIF":1.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12041500/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.2.025003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/30 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Accurate depth estimation in surgical videos is a pivotal component of numerous image-guided surgery procedures. However, creating ground truth depth maps for surgical videos is often infeasible due to challenges such as inconsistent illumination and sensor noise. As a result, self-supervised depth and ego-motion estimation frameworks are gaining traction, eliminating the need for manually annotated depth maps. Despite the progress, current self-supervised methods still rely on known camera intrinsic parameters, which are frequently unavailable or unrecorded in surgical environments. We address this gap by introducing a self-supervised system capable of jointly predicting depth maps, camera poses, and intrinsic parameters, providing a comprehensive solution for depth estimation under such constraints.
Approach: We developed a self-supervised depth and ego-motion estimation framework, incorporating a cost volume-based auxiliary supervision module. This module provides additional supervision for predicting camera intrinsic parameters, allowing for robust estimation even without predefined intrinsics. The system was rigorously evaluated on a public dataset to assess its effectiveness in simultaneously predicting depth, camera pose, and intrinsic parameters.
Results: The experimental results demonstrated that the proposed method significantly improved the accuracy of ego-motion and depth prediction, even when compared with methods incorporating known camera intrinsics. In addition, by integrating our cost volume-based supervision, the accuracy of camera parameter estimation, including intrinsic parameters, was further enhanced.
Conclusions: We present a self-supervised system for depth, ego-motion, and intrinsic parameter estimation, effectively overcoming the limitations imposed by unknown or missing camera intrinsics. The experimental results confirm that the proposed method outperforms the baseline techniques, offering a robust solution for depth estimation in complex surgical video scenarios, with broader implications for improving image-guided surgery systems.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.