Recurrent multi-view 6DoF pose estimation for marker-less surgical tool tracking.

IF 2.3 3区医学 Q3 ENGINEERING, BIOMEDICAL

International Journal of Computer Assisted Radiology and Surgery Pub Date : 2025-08-01 Epub Date: 2025-06-17 DOI:10.1007/s11548-025-03436-8

Niklas Agethen, Janis Rosskamp, Tom L Koller, Jan Klein, Gabriel Zachmann

{"title":"Recurrent multi-view 6DoF pose estimation for marker-less surgical tool tracking.","authors":"Niklas Agethen, Janis Rosskamp, Tom L Koller, Jan Klein, Gabriel Zachmann","doi":"10.1007/s11548-025-03436-8","DOIUrl":null,"url":null,"abstract":"Purpose: Marker-based tracking of surgical instruments facilitates surgical navigation systems with high precision, but requires time-consuming preparation and is prone to stains or occluded markers. Deep learning promises marker-less tracking based solely on RGB videos to address these challenges. In this paper, object pose estimation is applied to surgical instrument tracking using a novel deep learning architecture.Methods: We combine pose estimation from multiple views with recurrent neural networks to better exploit temporal coherence for improved tracking. We also investigate the performance under conditions where the instrument is obscured. We enhance an existing pose (distribution) estimation pipeline by a spatio-temporal feature extractor that allows for feature incorporation along an entire sequence of frames.Results: On a synthetic dataset we achieve a mean tip error below 1.0 mm and an angle error below 0.2 <math><mmultiscripts><mrow></mrow> <mrow></mrow> <mo>∘</mo></mmultiscripts> </math> using a four-camera setup. On a real dataset with four cameras we achieve an error below 3.0 mm. Under limited instrument visibility our recurrent approach can predict the tip position approximately 3 mm more precisely than the non-recurrent approach.Conclusion: Our findings on a synthetic dataset of surgical instruments demonstrate that deep-learning-based tracking using multiple cameras simultaneously can be competitive with marker-based systems. Additionally, the temporal information obtained through the architecture's recurrent nature is advantageous when the instrument is occluded. The synthesis of multi-view and recurrence has thus been shown to enhance the reliability and usability of high-precision surgical pose estimation.","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1589-1599"},"PeriodicalIF":2.3000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03436-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/17 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Marker-based tracking of surgical instruments facilitates surgical navigation systems with high precision, but requires time-consuming preparation and is prone to stains or occluded markers. Deep learning promises marker-less tracking based solely on RGB videos to address these challenges. In this paper, object pose estimation is applied to surgical instrument tracking using a novel deep learning architecture.

Methods: We combine pose estimation from multiple views with recurrent neural networks to better exploit temporal coherence for improved tracking. We also investigate the performance under conditions where the instrument is obscured. We enhance an existing pose (distribution) estimation pipeline by a spatio-temporal feature extractor that allows for feature incorporation along an entire sequence of frames.

Results: On a synthetic dataset we achieve a mean tip error below 1.0 mm and an angle error below 0.2 $^{\circ}$ using a four-camera setup. On a real dataset with four cameras we achieve an error below 3.0 mm. Under limited instrument visibility our recurrent approach can predict the tip position approximately 3 mm more precisely than the non-recurrent approach.

Conclusion: Our findings on a synthetic dataset of surgical instruments demonstrate that deep-learning-based tracking using multiple cameras simultaneously can be competitive with marker-based systems. Additionally, the temporal information obtained through the architecture's recurrent nature is advantageous when the instrument is occluded. The synthesis of multi-view and recurrence has thus been shown to enhance the reliability and usability of high-precision surgical pose estimation.

查看原文本刊更多论文

无标记手术工具跟踪的反复多视点6DoF位姿估计。

目的：基于标记物的手术器械跟踪有助于高精度的手术导航系统，但需要耗时准备，并且容易出现污渍或标记物闭塞。深度学习承诺仅基于RGB视频的无标记跟踪来解决这些挑战。本文采用一种新颖的深度学习架构，将目标姿态估计应用于手术器械跟踪。方法：我们将多视角姿态估计与递归神经网络相结合，以更好地利用时间相干性来改进跟踪。我们还研究了仪器被遮挡的情况下的性能。我们通过一个时空特征提取器增强了现有的姿态（分布）估计管道，该提取器允许沿整个帧序列合并特征。结果：在一个合成数据集上，我们使用四摄像头设置，使平均尖端误差小于1.0毫米，角度误差小于0.2°。在一个真实的数据集上，我们实现了一个小于3.0毫米的误差。在仪器能见度有限的情况下，我们的循环方法可以比非循环方法更精确地预测尖端位置约3mm。结论：我们在手术器械合成数据集上的发现表明，同时使用多个摄像机的基于深度学习的跟踪可以与基于标记的系统竞争。此外，当仪器被遮挡时，通过结构的周期性获得的时间信息是有利的。因此，多视点和递归的综合被证明可以提高高精度手术姿态估计的可靠性和可用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Computer Assisted Radiology and Surgery ENGINEERING, BIOMEDICAL-RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

CiteScore

5.90

自引率

6.70%

发文量

243

审稿时长

6-12 weeks

期刊介绍： The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.