{"title":"The RoDEM benchmark: evaluating the robustness of monocular single-shot depth estimation methods in minimally-invasive surgery.","authors":"Rasoul Sharifian, Navid Rabbani, Adrien Bartoli","doi":"10.1007/s11548-025-03375-4","DOIUrl":"https://doi.org/10.1007/s11548-025-03375-4","url":null,"abstract":"<p><strong>Purpose: </strong>Monocular Single-shot Depth Estimation (MoSDE) methods for Minimally-Invasive Surgery (MIS) are promising but their robustness in surgical conditions remains questionable. We introduce the RoDEM benchmark, comprising an advanced analysis of perturbations, a dataset acquired in realistic MIS conditions and metrics. The dataset consists of 29,803 ex-vivo images including 44 video sequences with depth Ground-Truth covering clean conditions and nine perturbations. We give the performance evaluation of nine existing MoSDE methods.</p><p><strong>Methods: </strong>An RGB-D structured-light camera was firmly attached to a laparoscope. The two cameras were internally calibrated and the rigid transformation between them was estimated. Synchronised images and videos were captured while producing real perturbations in three settings. The depth maps were eventually transferred to the laparoscope viewpoint and the images categorised by perturbation severity.</p><p><strong>Results: </strong>The proposed metrics cover accuracy (clean condition performance) and robustness (resilience to perturbations). We found that foundation models demonstrated higher accuracy than the other methods. All methods were robust to motion blur and bright light. Methods trained on large datasets were robust against smoke, blood, and low light whereas the other methods exhibited reduced robustness. None of the methods coped with lens dirtiness and defocus blur.</p><p><strong>Conclusion: </strong>This study highlighted the importance of robustness evaluation in MoSDE as many existing methods showed reduced accuracy against common surgical perturbations. It emphasises the importance of training with large datasets including perturbations. The proposed benchmark gives a precise and detailed analysis of a method's performance in the MIS conditions. It will be made publicly available.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144057485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden
{"title":"Objective skill assessment for cataract surgery from surgical microscope video.","authors":"Rebecca Hisey, Henry Lee, Adrienne Duimering, John Liu, Vasudha Gupta, Tamas Ungi, Christine Law, Gabor Fichtinger, Matthew Holden","doi":"10.1007/s11548-025-03366-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03366-5","url":null,"abstract":"<p><strong>Objective: </strong>Video offers an accessible method for automated surgical skill evaluation; however, many platforms still rely on traditional six-degree-of-freedom (6-DOF) tracking systems, which can be costly, cumbersome, and challenging to apply clinically. This study aims to demonstrate that trainee skill in cataract surgery can be assessed effectively using only object detection from monocular surgical microscope video.</p><p><strong>Methods: </strong>One ophthalmologist and four residents performed cataract surgery on a simulated eye five times each, generating 25 recordings. Recordings included both the surgical microscope video and 6-DOF instrument tracking data. Videos were graded by two expert ophthalmologists using the ICO-OSCAR:SICS rubric. We computed motion-based metrics using both object detection from video and 6-DOF tracking. We first examined correlations between each metric and expert scores for each rubric criteria. Then, using these findings, we trained an ordinal regression model to predict scores from each tracking modality and compared correlation strengths with expert scores.</p><p><strong>Results: </strong>Metrics from object detection generally showed stronger correlations with expert scores than 6-DOF tracking. For score prediction, 6-DOF tracking showed no significant advantage, while scores predicted from object detection achieved significantly stronger correlations with expert scores for four scoring criteria.</p><p><strong>Conclusion: </strong>Our results indicate that skill assessment from monocular surgical microscope video can match, and in some cases exceed, the correlation strengths of 6-DOF tracking assessments. This finding supports the feasibility of using object detection for skill assessment without additional hardware.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144059347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alaa Eldin Abdelaal, Rachelle Van Rumpt, Sayem Zaman, Irene Tong, Anthony Jarc, Gary L Gallia, Masaru Ishii, Gregory D Hager, Septimiu E Salcudean
{"title":"The quiet eye phenomenon in minimally invasive surgery.","authors":"Alaa Eldin Abdelaal, Rachelle Van Rumpt, Sayem Zaman, Irene Tong, Anthony Jarc, Gary L Gallia, Masaru Ishii, Gregory D Hager, Septimiu E Salcudean","doi":"10.1007/s11548-025-03367-4","DOIUrl":"https://doi.org/10.1007/s11548-025-03367-4","url":null,"abstract":"<p><strong>Purpose: </strong>The quiet eye (QE) behavior is a gaze behavior that has been extensively studied in sports training and has been associated with higher level of expertise in multiple sports. In this paper, we report our observations of this gaze behavior in two minimally invasive surgery settings and we report how this behavior changes based on task success and the surgeon's expertise level.</p><p><strong>Methods: </strong>We investigated the QE behavior in two independently collected data sets in a sinus surgery setting and a robotic surgery setting. The sinus surgery data set was used to study how the QE behavior changes in successful and unsuccessful tasks. The robotic surgery data set was used to study how the QE behavior changes based on the surgeon's expertise level.</p><p><strong>Results: </strong>Using the sinus surgery data set, our results show that the QE behavior is more likely to occur and that its duration is significantly longer, in successful tasks, compared with unsuccessful ones. Using the robotic surgery data set, our results show similar trends in tasks performed by experienced surgeons, compared with less experienced ones.</p><p><strong>Conclusion: </strong>The results of our study open the door to use the QE behavior in training and skill assessment in the explored minimally invasive surgery settings. Training novices to adopt the QE behavior can potentially improve their motor skill learning, replicating the success of doing so in sports training. In addition, the well-defined characteristics of the QE behavior can provide an explainable way to distinguish between different skill levels in minimally invasive surgery.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144007503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruisheng Su, P Matthijs van der Sluijs, Flavius-Gabriel Marc, Frank Te Nijenhuis, Sandra A P Cornelissen, Bob Roozenbeek, Wim H van Zwam, Aad van der Lugt, Danny Ruijters, Josien Pluim, Theo van Walsum
{"title":"perfDSA: Automatic Perfusion Imaging in Cerebral Digital Subtraction Angiography.","authors":"Ruisheng Su, P Matthijs van der Sluijs, Flavius-Gabriel Marc, Frank Te Nijenhuis, Sandra A P Cornelissen, Bob Roozenbeek, Wim H van Zwam, Aad van der Lugt, Danny Ruijters, Josien Pluim, Theo van Walsum","doi":"10.1007/s11548-025-03359-4","DOIUrl":"https://doi.org/10.1007/s11548-025-03359-4","url":null,"abstract":"<p><strong>Purpose: </strong>Cerebral digital subtraction angiography (DSA) is a standard imaging technique in image-guided interventions for visualizing cerebral blood flow and therapeutic guidance thanks to its high spatio-temporal resolution. To date, cerebral perfusion characteristics in DSA are primarily assessed visually by interventionists, which is time-consuming, error-prone, and subjective. To facilitate fast and reproducible assessment of cerebral perfusion, this work aims to develop and validate a fully automatic and quantitative framework for perfusion DSA.</p><p><strong>Methods: </strong>We put forward a framework, perfDSA, that automatically generates deconvolution-based perfusion parametric images from cerebral DSA. It automatically extracts the arterial input function from the supraclinoid internal carotid artery (ICA) and computes deconvolution-based perfusion parametric images including cerebral blood volume (CBV), cerebral blood flow (CBF), mean transit time (MTT), and Tmax.</p><p><strong>Results: </strong>On a DSA dataset with 1006 patients from the multicenter MR CLEAN registry, the proposed perfDSA achieves a Dice of 0.73(±0.21) in segmenting the supraclinoid ICA, resulting in high accuracy of arterial input function (AIF) curves similar to manual extraction. Moreover, some extracted perfusion images show statistically significant associations (P=2.62e <math><mo>-</mo></math> 5) with favorable functional outcomes in stroke patients.</p><p><strong>Conclusion: </strong>The proposed perfDSA framework promises to aid therapeutic decision-making in cerebrovascular interventions and facilitate discoveries of novel quantitative biomarkers in clinical practice. The code is available at https://github.com/RuishengSu/perfDSA .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143991220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viet Tran Ba, Marco Hübner, Ahmad Bin Qasim, Maike Rees, Jan Sellner, Silvia Seidlitz, Evangelia Christodoulou, Berkin Özdemir, Alexander Studier-Fischer, Felix Nickel, Leonardo Ayala, Lena Maier-Hein
{"title":"Semantic hyperspectral image synthesis for cross-modality knowledge transfer in surgical data science.","authors":"Viet Tran Ba, Marco Hübner, Ahmad Bin Qasim, Maike Rees, Jan Sellner, Silvia Seidlitz, Evangelia Christodoulou, Berkin Özdemir, Alexander Studier-Fischer, Felix Nickel, Leonardo Ayala, Lena Maier-Hein","doi":"10.1007/s11548-025-03364-7","DOIUrl":"https://doi.org/10.1007/s11548-025-03364-7","url":null,"abstract":"<p><strong>Purpose: </strong>Hyperspectral imaging (HSI) is a promising intraoperative imaging modality, with potential applications ranging from tissue classification and discrimination to perfusion monitoring and cancer detection. However, surgical HSI datasets are scarce, hindering the development of robust data-driven algorithms. The purpose of this work was to address this critical bottleneck with a novel approach to knowledge transfer across modalities.</p><p><strong>Methods: </strong>We propose the use of generative modeling to leverage imaging data across optical imaging modalities. The core of the method is a latent diffusion model (LDM) capable of converting a semantic segmentation mask obtained from any modality into a realistic hyperspectral image, such that geometry information can be learned across modalities. The value of the approach was assessed both qualitatively and quantitatively using surgical scene segmentation as a downstream task.</p><p><strong>Results: </strong>Our study with more than 13,000 hyperspectral images, partially annotated with a total of 37 tissue and object classes, suggests that LDMs are well-suited for the synthesis of realistic high-resolution hyperspectral images even when trained on few samples or applied to annotations from different modalities and geometric out-of-distribution annotations. Using our approach for generative augmentation yielded a performance boost of up to 35% in the Dice similarity coefficient for the task of semantic hyperspectral image segmentation.</p><p><strong>Conclusion: </strong>As our method is capable of augmenting HSI datasets in a manner agnostic to the modality of the leveraged data, it could serve as a blueprint for addressing the data bottleneck encountered for novel imaging modalities.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144056990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature distance-weighted adaptive decoupled knowledge distillation for medical image segmentation.","authors":"Xiangchun Yu, Ziyun Xiong, Miaomiao Liang, Lingjuan Yu, Jian Zheng","doi":"10.1007/s11548-025-03346-9","DOIUrl":"https://doi.org/10.1007/s11548-025-03346-9","url":null,"abstract":"<p><strong>Purpose: </strong>This paper aims to apply decoupled knowledge distillation (DKD) to medical image segmentation, focusing on transferring knowledge from a high-performance teacher network to a lightweight student network, thereby facilitating model deployment on embedded devices.</p><p><strong>Methods: </strong>We initially decouple the distillation loss into pixel-wise target class knowledge distillation (PTCKD) and pixel-wise non-target class knowledge distillation (PNCKD). Subsequently, to address the limitations of the fixed weight paradigm in PTCKD, we propose a novel feature distance-weighted adaptive decoupled knowledge distillation (FDWA-DKD) method. FDWA-DKD quantifies the feature disparity between student and teacher, generating instance-level adaptive weights for PTCKD. We design a feature distance weighting (FDW) module that dynamically calculates feature distance to obtain adaptive weights, integrating feature space distance information into logit distillation. Lastly, we introduce a class-wise feature probability distribution loss to encourage the student to mimic the teacher's spatial distribution.</p><p><strong>Results: </strong>Extensive experiments conducted on the Synapse and FLARE22 datasets demonstrate that our proposed FDWA-DKD achieves satisfactory performance, yielding optimal Dice scores and, in some instances, surpassing the performance of the teacher network. Ablation studies further validate the effectiveness of each module within our proposed method.</p><p><strong>Conclusion: </strong>Our method overcomes the constraints of traditional distillation methods by offering instance-level adaptive learning weights tailored to PTCKD. By quantifying student-teacher feature disparity and minimizing class-wise feature probability distribution loss, our method outperforms other distillation methods.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144042831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saurav Sharma, Maria Vannucci, Leonardo Pestana Legori, Mario Scaglia, Giovanni Guglielmo Laracca, Didier Mutter, Sergio Alfieri, Pietro Mascagni, Nicolas Padoy
{"title":"Early operative difficulty assessment in laparoscopic cholecystectomy via snapshot-centric video analysis.","authors":"Saurav Sharma, Maria Vannucci, Leonardo Pestana Legori, Mario Scaglia, Giovanni Guglielmo Laracca, Didier Mutter, Sergio Alfieri, Pietro Mascagni, Nicolas Padoy","doi":"10.1007/s11548-025-03372-7","DOIUrl":"https://doi.org/10.1007/s11548-025-03372-7","url":null,"abstract":"<p><strong>Purpose: </strong>Laparoscopic cholecystectomy (LC) operative difficulty (LCOD) is highly variable and influences outcomes. Despite extensive LC studies in surgical workflow analysis, limited efforts explore LCOD using intraoperative video data. Early recognition of LCOD could allow prompt review by expert surgeons, enhance operating room (OR) planning, and improve surgical outcomes.</p><p><strong>Methods: </strong>We propose the clinical task of early LCOD assessment using limited video observations. We design SurgPrOD, a deep learning model to assess LCOD by analyzing features from global and local temporal resolutions (snapshots) of the observed LC video. Also, we propose a novel snapshot-centric attention (SCA) module, acting across snapshots, to enhance LCOD prediction. We introduce the CholeScore dataset, featuring video-level LCOD labels to validate our method.</p><p><strong>Results: </strong>We evaluate SurgPrOD on 3 LCOD assessment scales in the CholeScore dataset. On our new metric assessing early and stable correct predictions, SurgPrOD surpasses baselines by at least 0.22 points. SurgPrOD improves over baselines by at least 9 and 5 percentage points in F1 score and top1-accuracy, respectively, demonstrating its effectiveness in correct predictions.</p><p><strong>Conclusion: </strong>We propose a new task for early LCOD assessment and a novel model, SurgPrOD, analyzing surgical video from global and local perspectives. Our results on the CholeScore dataset establish a new benchmark to study LCOD using intraoperative video data.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144031649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Viktor Vörös, Xuan Thao Ha, Wim-Alexander Beckers, Johan Bennett, Tom Kimpe, Emmanuel Vander Poorten
{"title":"Hybrid 3D augmented reality for image-guided therapy using autostereoscopic visualization.","authors":"Viktor Vörös, Xuan Thao Ha, Wim-Alexander Beckers, Johan Bennett, Tom Kimpe, Emmanuel Vander Poorten","doi":"10.1007/s11548-025-03357-6","DOIUrl":"https://doi.org/10.1007/s11548-025-03357-6","url":null,"abstract":"<p><strong>Purpose: </strong>During image-guided therapy, cardiologists use 2-dimensional (2D) imaging modalities to navigate the catheters, resulting in a loss of depth perception. Augmented reality (AR) is being explored to overcome the challenges, by visualizing patient-specific 3D models or 3D shape of the catheter. However, when this 3D content is presented on a 2D display, important depth information may be lost. This paper proposes a hybrid 3D AR visualization method combining stereo 3D AR guidance with conventional 2D modalities.</p><p><strong>Methods: </strong>A cardiovascular catheterization simulator was developed consisting of a phantom vascular model, a catheter with embedded shape sensing, and an autostereoscopic display. A user study involving interventional cardiologists ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>5</mn></mrow> </math> ) and electrophysiologists ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>2</mn></mrow> </math> ) was set up. The study compared the hybrid 3D AR guidance with simulated fluoroscopy and 2D AR guidance in a catheter navigation task.</p><p><strong>Results: </strong>Despite improvements in task time and traveled path length, the difference in performance was not significant. However, a reduction of 50% and 81% with 2D and hybrid 3D AR in the number of incorrect artery entries was found, respectively. The results of the questionnaires showed a reduced mental load and a higher confidence with the proposed hybrid 3D AR guidance. All but one participant indicated to feel comfortable looking at the hybrid 3D view.</p><p><strong>Conclusion: </strong>The findings suggest that AR guidance, particularly in a hybrid 3D visualization format, enhances spatial awareness and reduces mental load for cardiologists. The autostereoscopic 3D view demonstrated superiority in estimating the pose and relationship of the catheter relative to the vascular model.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144008284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingxuan Chen, Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy
{"title":"Text-driven adaptation of foundation models for few-shot surgical workflow analysis.","authors":"Tingxuan Chen, Kun Yuan, Vinkle Srivastav, Nassir Navab, Nicolas Padoy","doi":"10.1007/s11548-025-03341-0","DOIUrl":"https://doi.org/10.1007/s11548-025-03341-0","url":null,"abstract":"<p><strong>Purpose: </strong>Surgical workflow analysis is crucial for improving surgical efficiency and safety. However, previous studies rely heavily on large-scale annotated datasets, posing challenges in cost, scalability, and reliance on expert annotations. To address this, we propose Surg-FTDA (Few-shot Text-driven Adaptation), designed to handle various surgical workflow analysis tasks with minimal paired image-label data.</p><p><strong>Methods: </strong>Our approach has two key components. First, few-shot selection-based modality alignment selects a small subset of images and aligns their embeddings with text embeddings from the downstream task, bridging the modality gap. Second, text-driven adaptation leverages only text data to train a decoder, eliminating the need for paired image-text data. This decoder is then applied to aligned image embeddings, enabling image-related tasks without explicit image-text pairs.</p><p><strong>Results: </strong>We evaluate our approach on generative tasks (image captioning) and discriminative tasks (triplet recognition and phase recognition). Results show that Surg-FTDA outperforms baselines and generalizes well across downstream tasks.</p><p><strong>Conclusion: </strong>We propose a text-driven adaptation approach that mitigates the modality gap and handles multiple downstream tasks in surgical workflow analysis, with minimal reliance on large annotated datasets. The code and dataset will be released in https://github.com/CAMMApublic/Surg-FTDA .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144049983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Franziska Jurosch, Janik Zeller, Lars Wagner, Ege Özsoy, Alissa Jell, Sven Kolb, Dirk Wilhelm
{"title":"Video-based multi-target multi-camera tracking for postoperative phase recognition.","authors":"Franziska Jurosch, Janik Zeller, Lars Wagner, Ege Özsoy, Alissa Jell, Sven Kolb, Dirk Wilhelm","doi":"10.1007/s11548-025-03344-x","DOIUrl":"https://doi.org/10.1007/s11548-025-03344-x","url":null,"abstract":"<p><strong>Purpose: </strong>Deep learning methods are commonly used to generate context understanding to support surgeons and medical professionals. By expanding the current focus beyond the operating room (OR) to postoperative workflows, new forms of assistance are possible. In this article, we propose a novel multi-target multi-camera tracking (MTMCT) architecture for postoperative phase recognition, location tracking, and automatic timestamp generation.</p><p><strong>Methods: </strong>Three RGB cameras were used to create a multi-camera data set containing 19 reenacted postoperative patient flows. Patients and beds were annotated and used to train the custom MTMCT architecture. It includes bed and patient tracking for each camera and a postoperative patient state module to provide the postoperative phase, current location of the patient, and automatically generated timestamps.</p><p><strong>Results: </strong>The architecture demonstrates robust performance for single- and multi-patient scenarios by embedding medical domain-specific knowledge. In multi-patient scenarios, the state machine representing the postoperative phases has a traversal accuracy of <math><mrow><mn>84.9</mn> <mo>±</mo> <mn>6.0</mn> <mo>%</mo></mrow> </math> , <math><mrow><mn>91.4</mn> <mo>±</mo> <mn>1.5</mn> <mo>%</mo></mrow> </math> of timestamps are generated correctly, and the patient tracking IDF1 reaches <math><mrow><mn>92.0</mn> <mo>±</mo> <mn>3.6</mn> <mo>%</mo></mrow> </math> . Comparative experiments show the effectiveness of using AFLink for matching partial trajectories in postoperative settings.</p><p><strong>Conclusion: </strong>As our approach shows promising results, it lays the foundation for real-time surgeon support, enhancing clinical documentation and ultimately improving patient care.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144045515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}