Min Tan , Yushun Tao , Boyun Zheng , Gaosheng Xie , Zeyang Xia , Jing Xiong
{"title":"结构-含量集成扩散的准确快速单眼内窥镜深度估计","authors":"Min Tan , Yushun Tao , Boyun Zheng , Gaosheng Xie , Zeyang Xia , Jing Xiong","doi":"10.1016/j.compmedimag.2025.102640","DOIUrl":null,"url":null,"abstract":"<div><div>Endoscopic depth estimation is crucial for video understanding, robotic navigation, and 3D reconstruction in minimally invasive surgeries. However, existing methods for monocular depth estimation often struggle with the challenging conditions of endoscopic imagery, such as complex illumination, narrow luminal spaces, and low-contrast surfaces, resulting in inaccurate depth predictions. To address these challenges, we propose the Structure-Content Integrated Diffusion Estimation (SCIDE) for accurate and fast endoscopic depth estimation. Specifically, we introduce the Structure Content Extractor (SC-Extractor), a module specifically designed to extract structure and content priors to guide the depth estimation process in endoscopic environments. Additionally, we propose the Fast Optimized Diffusion Sampler (FODS) to meet the real-time needs in endoscopic surgery scenarios. FODS is a general sampling mechanism that optimizes selection of time steps in diffusion models. Our method (SCIDE) shows remarkable performance with an RMSE value of 0.0875 and a reduction of 74.2% in inference time when using FODS. These results demonstrate that our SCIDE framework achieves state-of-the-art accuracy of endoscopic depth estimation, and making real-time application feasible in endoscopic surgeries. <span><span>https://misrobotx.github.io/scide/</span><svg><path></path></svg></span></div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102640"},"PeriodicalIF":4.9000,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accurate and fast monocular endoscopic depth estimation of structure-content integrated diffusion\",\"authors\":\"Min Tan , Yushun Tao , Boyun Zheng , Gaosheng Xie , Zeyang Xia , Jing Xiong\",\"doi\":\"10.1016/j.compmedimag.2025.102640\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Endoscopic depth estimation is crucial for video understanding, robotic navigation, and 3D reconstruction in minimally invasive surgeries. However, existing methods for monocular depth estimation often struggle with the challenging conditions of endoscopic imagery, such as complex illumination, narrow luminal spaces, and low-contrast surfaces, resulting in inaccurate depth predictions. To address these challenges, we propose the Structure-Content Integrated Diffusion Estimation (SCIDE) for accurate and fast endoscopic depth estimation. Specifically, we introduce the Structure Content Extractor (SC-Extractor), a module specifically designed to extract structure and content priors to guide the depth estimation process in endoscopic environments. Additionally, we propose the Fast Optimized Diffusion Sampler (FODS) to meet the real-time needs in endoscopic surgery scenarios. FODS is a general sampling mechanism that optimizes selection of time steps in diffusion models. Our method (SCIDE) shows remarkable performance with an RMSE value of 0.0875 and a reduction of 74.2% in inference time when using FODS. These results demonstrate that our SCIDE framework achieves state-of-the-art accuracy of endoscopic depth estimation, and making real-time application feasible in endoscopic surgeries. <span><span>https://misrobotx.github.io/scide/</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":50631,\"journal\":{\"name\":\"Computerized Medical Imaging and Graphics\",\"volume\":\"125 \",\"pages\":\"Article 102640\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computerized Medical Imaging and Graphics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0895611125001491\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0895611125001491","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Accurate and fast monocular endoscopic depth estimation of structure-content integrated diffusion
Endoscopic depth estimation is crucial for video understanding, robotic navigation, and 3D reconstruction in minimally invasive surgeries. However, existing methods for monocular depth estimation often struggle with the challenging conditions of endoscopic imagery, such as complex illumination, narrow luminal spaces, and low-contrast surfaces, resulting in inaccurate depth predictions. To address these challenges, we propose the Structure-Content Integrated Diffusion Estimation (SCIDE) for accurate and fast endoscopic depth estimation. Specifically, we introduce the Structure Content Extractor (SC-Extractor), a module specifically designed to extract structure and content priors to guide the depth estimation process in endoscopic environments. Additionally, we propose the Fast Optimized Diffusion Sampler (FODS) to meet the real-time needs in endoscopic surgery scenarios. FODS is a general sampling mechanism that optimizes selection of time steps in diffusion models. Our method (SCIDE) shows remarkable performance with an RMSE value of 0.0875 and a reduction of 74.2% in inference time when using FODS. These results demonstrate that our SCIDE framework achieves state-of-the-art accuracy of endoscopic depth estimation, and making real-time application feasible in endoscopic surgeries. https://misrobotx.github.io/scide/
期刊介绍:
The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.