{"title":"Trimming-then-augmentation: Towards robust depth and odometry estimation for endoscopic images","authors":"Junyang Wu , Yun Gu , Guang-Zhong Yang","doi":"10.1016/j.media.2025.103736","DOIUrl":null,"url":null,"abstract":"<div><div>Depth and odometry estimation for endoscopic imaging is an essential task for robot assisted endoluminal intervention. Due to the difficulty of obtaining sufficient <em>in vivo</em> ground truth data, unsupervised learning is preferred in practical settings. Existing methods, however, are hampered by imaging artifacts and the paucity of unique anatomical markers, coupled with tissue motion and specular reflections, leading to the poor accuracy and generalizability. In this work, a trimming-then-augmentation framework is proposed. It uses a “mask-then-recover” training strategy to firstly mask out the artifact regions and then reconstruct the depth and pose information based on the global perception of a convolutional network. Subsequently, an augmentation module is used to provide stable correspondence between endoscopic image pairs. A task-specific loss function guides the augmentation module to adaptively establish stable feature pairs for improving the overall accuracy of subsequent 3D structural reconstruction. Detailed validation has been performed with results showing that the proposed method can significantly improve the accuracy of existing state-of-the-art unsupervised methods, demonstrating the effectiveness of the method and its resilience to image artifacts, in addition to its stability when applied to <em>in vivo</em> settings.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"107 ","pages":"Article 103736"},"PeriodicalIF":11.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S136184152500283X","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Depth and odometry estimation for endoscopic imaging is an essential task for robot assisted endoluminal intervention. Due to the difficulty of obtaining sufficient in vivo ground truth data, unsupervised learning is preferred in practical settings. Existing methods, however, are hampered by imaging artifacts and the paucity of unique anatomical markers, coupled with tissue motion and specular reflections, leading to the poor accuracy and generalizability. In this work, a trimming-then-augmentation framework is proposed. It uses a “mask-then-recover” training strategy to firstly mask out the artifact regions and then reconstruct the depth and pose information based on the global perception of a convolutional network. Subsequently, an augmentation module is used to provide stable correspondence between endoscopic image pairs. A task-specific loss function guides the augmentation module to adaptively establish stable feature pairs for improving the overall accuracy of subsequent 3D structural reconstruction. Detailed validation has been performed with results showing that the proposed method can significantly improve the accuracy of existing state-of-the-art unsupervised methods, demonstrating the effectiveness of the method and its resilience to image artifacts, in addition to its stability when applied to in vivo settings.
期刊介绍:
Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.