WS-SfMLearner: self-supervised monocular depth and ego-motion estimation on surgical videos with unknown camera parameters.

IF 1.9 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging Pub Date : 2025-03-01 Epub Date: 2025-04-30 DOI:10.1117/1.JMI.12.2.025003

Ange Lou, Jack Noble

{"title":"WS-SfMLearner: self-supervised monocular depth and ego-motion estimation on surgical videos with unknown camera parameters.","authors":"Ange Lou, Jack Noble","doi":"10.1117/1.JMI.12.2.025003","DOIUrl":null,"url":null,"abstract":"Purpose: Accurate depth estimation in surgical videos is a pivotal component of numerous image-guided surgery procedures. However, creating ground truth depth maps for surgical videos is often infeasible due to challenges such as inconsistent illumination and sensor noise. As a result, self-supervised depth and ego-motion estimation frameworks are gaining traction, eliminating the need for manually annotated depth maps. Despite the progress, current self-supervised methods still rely on known camera intrinsic parameters, which are frequently unavailable or unrecorded in surgical environments. We address this gap by introducing a self-supervised system capable of jointly predicting depth maps, camera poses, and intrinsic parameters, providing a comprehensive solution for depth estimation under such constraints.Approach: We developed a self-supervised depth and ego-motion estimation framework, incorporating a cost volume-based auxiliary supervision module. This module provides additional supervision for predicting camera intrinsic parameters, allowing for robust estimation even without predefined intrinsics. The system was rigorously evaluated on a public dataset to assess its effectiveness in simultaneously predicting depth, camera pose, and intrinsic parameters.Results: The experimental results demonstrated that the proposed method significantly improved the accuracy of ego-motion and depth prediction, even when compared with methods incorporating known camera intrinsics. In addition, by integrating our cost volume-based supervision, the accuracy of camera parameter estimation, including intrinsic parameters, was further enhanced.Conclusions: We present a self-supervised system for depth, ego-motion, and intrinsic parameter estimation, effectively overcoming the limitations imposed by unknown or missing camera intrinsics. The experimental results confirm that the proposed method outperforms the baseline techniques, offering a robust solution for depth estimation in complex surgical video scenarios, with broader implications for improving image-guided surgery systems.","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 2","pages":"025003"},"PeriodicalIF":1.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12041500/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.2.025003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/30 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Accurate depth estimation in surgical videos is a pivotal component of numerous image-guided surgery procedures. However, creating ground truth depth maps for surgical videos is often infeasible due to challenges such as inconsistent illumination and sensor noise. As a result, self-supervised depth and ego-motion estimation frameworks are gaining traction, eliminating the need for manually annotated depth maps. Despite the progress, current self-supervised methods still rely on known camera intrinsic parameters, which are frequently unavailable or unrecorded in surgical environments. We address this gap by introducing a self-supervised system capable of jointly predicting depth maps, camera poses, and intrinsic parameters, providing a comprehensive solution for depth estimation under such constraints.

Approach: We developed a self-supervised depth and ego-motion estimation framework, incorporating a cost volume-based auxiliary supervision module. This module provides additional supervision for predicting camera intrinsic parameters, allowing for robust estimation even without predefined intrinsics. The system was rigorously evaluated on a public dataset to assess its effectiveness in simultaneously predicting depth, camera pose, and intrinsic parameters.

Results: The experimental results demonstrated that the proposed method significantly improved the accuracy of ego-motion and depth prediction, even when compared with methods incorporating known camera intrinsics. In addition, by integrating our cost volume-based supervision, the accuracy of camera parameter estimation, including intrinsic parameters, was further enhanced.

Conclusions: We present a self-supervised system for depth, ego-motion, and intrinsic parameter estimation, effectively overcoming the limitations imposed by unknown or missing camera intrinsics. The experimental results confirm that the proposed method outperforms the baseline techniques, offering a robust solution for depth estimation in complex surgical video scenarios, with broader implications for improving image-guided surgery systems.

查看原文本刊更多论文

WS-SfMLearner：对摄像机参数未知的手术视频进行自监督单目深度和自运动估计。

目的：在手术视频中准确的深度估计是许多图像引导手术程序的关键组成部分。然而，由于光照不一致和传感器噪声等挑战，为手术视频创建地面真实深度图通常是不可行的。因此，自监督深度和自我运动估计框架正在获得牵引力，消除了手动注释深度图的需要。尽管取得了进展，但目前的自我监督方法仍然依赖于已知的相机固有参数，这些参数在手术环境中经常不可用或未记录。我们通过引入一个能够联合预测深度图、相机姿态和内在参数的自监督系统来解决这一差距，为这种约束下的深度估计提供了一个全面的解决方案。方法：我们开发了一个自监督深度和自我运动估计框架，其中包含一个基于成本量的辅助监督模块。该模块为预测相机内部参数提供了额外的监督，即使没有预定义的内部参数，也可以进行鲁棒估计。该系统在一个公共数据集上进行了严格的评估，以评估其同时预测深度、相机姿势和内在参数的有效性。结果：实验结果表明，即使与包含已知相机特性的方法相比，所提出的方法也显著提高了自我运动和深度预测的准确性。此外，通过整合我们基于成本量的监督，进一步提高了摄像机参数估计的准确性，包括固有参数估计。结论：我们提出了一个深度、自我运动和固有参数估计的自监督系统，有效地克服了未知或缺失相机固有特性所带来的限制。实验结果证实，该方法优于基线技术，为复杂手术视频场景的深度估计提供了鲁棒解决方案，对改进图像引导手术系统具有更广泛的意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.