Xue Li , Wenxin Chen , Xingguang Duan , Xiaoyi Gu , Changsheng Li
{"title":"微创手术中用于单眼深度估计的协同手术器械分割","authors":"Xue Li , Wenxin Chen , Xingguang Duan , Xiaoyi Gu , Changsheng Li","doi":"10.1016/j.media.2025.103765","DOIUrl":null,"url":null,"abstract":"<div><div>Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction.</div><div>The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"107 ","pages":"Article 103765"},"PeriodicalIF":11.8000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery\",\"authors\":\"Xue Li , Wenxin Chen , Xingguang Duan , Xiaoyi Gu , Changsheng Li\",\"doi\":\"10.1016/j.media.2025.103765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction.</div><div>The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.</div></div>\",\"PeriodicalId\":18328,\"journal\":{\"name\":\"Medical image analysis\",\"volume\":\"107 \",\"pages\":\"Article 103765\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical image analysis\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1361841525003111\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525003111","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Collaborative surgical instrument segmentation for monocular depth estimation in minimally invasive surgery
Depth estimation is essential for image-guided surgical procedures, particularly in minimally invasive environments where accurate 3D perception is critical. This paper proposes a two-stage self-supervised monocular depth estimation framework that incorporates instrument segmentation as a task-level prior to enhance spatial understanding. In the first stage, segmentation and depth estimation models are trained separately on the RIS, SCARED datasets to capture task-specific representations. In the second stage, segmentation masks predicted on the dVPN dataset are fused with RGB inputs to guide the refinement of depth prediction.
The framework employs a shared encoder and multiple decoders to enable efficient feature sharing across tasks. Comprehensive experiments on the RIS, SCARED, dVPN, and SERV-CT datasets validate the effectiveness and generalizability of the proposed approach. The results demonstrate that segmentation-aware depth estimation improves geometric reasoning in challenging surgical scenes, including those with occlusions, specularities regions.
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.