Trimming-then-augmentation: Towards robust depth and odometry estimation for endoscopic images

IF 11.8 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-09-03 DOI:10.1016/j.media.2025.103736

Junyang Wu , Yun Gu , Guang-Zhong Yang

{"title":"Trimming-then-augmentation: Towards robust depth and odometry estimation for endoscopic images","authors":"Junyang Wu , Yun Gu , Guang-Zhong Yang","doi":"10.1016/j.media.2025.103736","DOIUrl":null,"url":null,"abstract":"<div><div>Depth and odometry estimation for endoscopic imaging is an essential task for robot assisted endoluminal intervention. Due to the difficulty of obtaining sufficient <em>in vivo</em> ground truth data, unsupervised learning is preferred in practical settings. Existing methods, however, are hampered by imaging artifacts and the paucity of unique anatomical markers, coupled with tissue motion and specular reflections, leading to the poor accuracy and generalizability. In this work, a trimming-then-augmentation framework is proposed. It uses a “mask-then-recover” training strategy to firstly mask out the artifact regions and then reconstruct the depth and pose information based on the global perception of a convolutional network. Subsequently, an augmentation module is used to provide stable correspondence between endoscopic image pairs. A task-specific loss function guides the augmentation module to adaptively establish stable feature pairs for improving the overall accuracy of subsequent 3D structural reconstruction. Detailed validation has been performed with results showing that the proposed method can significantly improve the accuracy of existing state-of-the-art unsupervised methods, demonstrating the effectiveness of the method and its resilience to image artifacts, in addition to its stability when applied to <em>in vivo</em> settings.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"107 ","pages":"Article 103736"},"PeriodicalIF":11.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S136184152500283X","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Depth and odometry estimation for endoscopic imaging is an essential task for robot assisted endoluminal intervention. Due to the difficulty of obtaining sufficient in vivo ground truth data, unsupervised learning is preferred in practical settings. Existing methods, however, are hampered by imaging artifacts and the paucity of unique anatomical markers, coupled with tissue motion and specular reflections, leading to the poor accuracy and generalizability. In this work, a trimming-then-augmentation framework is proposed. It uses a “mask-then-recover” training strategy to firstly mask out the artifact regions and then reconstruct the depth and pose information based on the global perception of a convolutional network. Subsequently, an augmentation module is used to provide stable correspondence between endoscopic image pairs. A task-specific loss function guides the augmentation module to adaptively establish stable feature pairs for improving the overall accuracy of subsequent 3D structural reconstruction. Detailed validation has been performed with results showing that the proposed method can significantly improve the accuracy of existing state-of-the-art unsupervised methods, demonstrating the effectiveness of the method and its resilience to image artifacts, in addition to its stability when applied to in vivo settings.

查看原文本刊更多论文

修剪-然后增强：对内窥镜图像的鲁棒深度和里程估计

内窥镜成像的深度和里程估计是机器人辅助腔内介入的基本任务。由于难以获得足够的活体地面真实数据，因此在实际设置中首选无监督学习。然而，现有的方法受到成像伪影和缺乏独特解剖标记的阻碍，再加上组织运动和镜面反射，导致准确性和泛化性差。在这项工作中，提出了一个修整-增强框架。它采用“先掩码后恢复”的训练策略，首先掩码去除伪图像区域，然后基于卷积网络的全局感知重构深度和姿态信息。随后，增强模块用于提供内镜图像对之间的稳定对应。特定于任务的损失函数引导增强模块自适应地建立稳定的特征对，以提高后续三维结构重建的整体精度。详细的验证结果表明，所提出的方法可以显着提高现有最先进的无监督方法的准确性，证明了该方法的有效性及其对图像伪影的弹性，以及应用于体内设置时的稳定性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.