Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-05-10 DOI:10.1016/j.media.2025.103599

Hongyu Wang , Yonghao Long , Yueyao Chen , Hon-Chi Yip , Markus Scheppach , Philip Wai-Yan Chiu , Yeung Yam , Helen Mei-Ling Meng , Qi Dou

{"title":"Learning dissection trajectories from expert surgical videos via imitation learning with equivariant diffusion","authors":"Hongyu Wang , Yonghao Long , Yueyao Chen , Hon-Chi Yip , Markus Scheppach , Philip Wai-Yan Chiu , Yeung Yam , Helen Mei-Ling Meng , Qi Dou","doi":"10.1016/j.media.2025.103599","DOIUrl":null,"url":null,"abstract":"<div><div>Endoscopic Submucosal Dissection (ESD) constitutes a firmly well-established technique within endoscopic resection for the elimination of epithelial lesions. Dissection trajectory prediction in ESD videos has the potential to strengthen surgical skills training and simplify surgical skills training. However, this approach has been seldom explored in previous research. While imitation learning has proven effective in learning skills from expert demonstrations, it encounters difficulties in predicting uncertain future movements, learning geometric symmetries and generalizing to diverse surgical scenarios. This paper introduces imitation learning for the critical task of predicting dissection trajectories from expert video demonstrations. We propose a novel Implicit Diffusion Policy with Equivariant Representations for Imitation Learning (iDPOE) to address this variability. Our method implicitly models expert behaviors using a joint state–action distribution, capturing the inherent stochasticity of future dissection trajectories and enabling robust visual representation learning across various endoscopic views. By incorporating a diffusion model in policy learning, our approach facilitates efficient training and sampling, resulting in more accurate predictions and improved generalization. Additionally, we integrate equivariance into the learning process to enhance the model’s ability to generalize to geometric symmetries in trajectory prediction. To enable conditional sampling from the implicit policy, we develop a forward-process guided action inference strategy to correct state mismatches. We evaluated our method using a collected ESD video dataset comprising nearly 2000 clips. Experimental results demonstrate that our approach outperforms both explicit and implicit state-of-the-art methods in trajectory prediction. As far as we know, this is the first endeavor to utilize imitation learning-based techniques for surgical skill learning in terms of dissection trajectory prediction.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103599"},"PeriodicalIF":10.7000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S136184152500146X","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Endoscopic Submucosal Dissection (ESD) constitutes a firmly well-established technique within endoscopic resection for the elimination of epithelial lesions. Dissection trajectory prediction in ESD videos has the potential to strengthen surgical skills training and simplify surgical skills training. However, this approach has been seldom explored in previous research. While imitation learning has proven effective in learning skills from expert demonstrations, it encounters difficulties in predicting uncertain future movements, learning geometric symmetries and generalizing to diverse surgical scenarios. This paper introduces imitation learning for the critical task of predicting dissection trajectories from expert video demonstrations. We propose a novel Implicit Diffusion Policy with Equivariant Representations for Imitation Learning (iDPOE) to address this variability. Our method implicitly models expert behaviors using a joint state–action distribution, capturing the inherent stochasticity of future dissection trajectories and enabling robust visual representation learning across various endoscopic views. By incorporating a diffusion model in policy learning, our approach facilitates efficient training and sampling, resulting in more accurate predictions and improved generalization. Additionally, we integrate equivariance into the learning process to enhance the model’s ability to generalize to geometric symmetries in trajectory prediction. To enable conditional sampling from the implicit policy, we develop a forward-process guided action inference strategy to correct state mismatches. We evaluated our method using a collected ESD video dataset comprising nearly 2000 clips. Experimental results demonstrate that our approach outperforms both explicit and implicit state-of-the-art methods in trajectory prediction. As far as we know, this is the first endeavor to utilize imitation learning-based techniques for surgical skill learning in terms of dissection trajectory prediction.

查看原文本刊更多论文

通过等变扩散的模仿学习从专家手术视频中学习解剖轨迹

内镜下粘膜剥离术（ESD）是内镜下切除上皮病变的一项成熟技术。ESD视频中的解剖轨迹预测具有加强手术技能培训、简化手术技能培训的潜力。然而，这种方法在以往的研究中很少被探索。虽然模仿学习在从专家演示中学习技能方面被证明是有效的，但它在预测不确定的未来运动、学习几何对称和推广到不同的手术场景方面遇到了困难。本文引入了模仿学习，用于从专家视频演示中预测解剖轨迹的关键任务。我们提出了一种新的具有等效模仿学习表示（iDPOE）的隐式扩散策略来解决这种可变性。我们的方法使用联合状态-动作分布隐式建模专家行为，捕获未来解剖轨迹的固有随机性，并实现跨各种内窥镜视图的鲁棒视觉表示学习。通过在策略学习中结合扩散模型，我们的方法促进了有效的训练和抽样，从而产生更准确的预测和改进的泛化。此外，我们将等方差整合到学习过程中，以增强模型在轨迹预测中推广到几何对称性的能力。为了从隐式策略中实现条件采样，我们开发了一个前向过程引导的动作推理策略来纠正状态不匹配。我们使用收集到的包含近2000个片段的ESD视频数据集来评估我们的方法。实验结果表明，我们的方法在弹道预测方面优于显式和隐式最先进的方法。据我们所知，这是第一次尝试利用模仿学习技术在解剖轨迹预测方面进行手术技能学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.