{"title":"Quantifying the effects of delays on telerobotic surgical operability via brain activity measurements.","authors":"Junnosuke Ichihara, Satoshi Miura","doi":"10.1007/s11548-025-03487-x","DOIUrl":"https://doi.org/10.1007/s11548-025-03487-x","url":null,"abstract":"<p><strong>Purpose: </strong>Telesurgery, increasingly valued for enabling remote procedures post-COVID, can be critically affected by communication delays-typically negligible in conventional robot-assisted surgery due to surgeon-patient co-location. While previous studies have assessed the impact of delays on surgical performance, their effects on the operator's cognitive state remain unclear. Therefore, this study assessed delay-induced changes in telesurgery operability based on intraparietal sulcus (IPS) activity.</p><p><strong>Methods: </strong>A virtual-reality-based surgical assistance simulator was developed using the Unity game engine to replicate the da Vinci surgical robot and colorectal suturing environment. The simulator randomly introduced seven delay conditions to assess their impact on IPS activity during suturing. Eight right-handed participants, all of whom were non-medical students with no prior surgical experience, performed suturing while their IPS activity was measured using functional near-infrared spectroscopy. The left- and right-sided IPS activities were measured separately, and the task completion time and suturing error rate were also recorded for comparison.</p><p><strong>Results: </strong>Significance was assessed using the nonparametric Jonckheere-Terpstra test. Left- and right-sided IPS activities decreased significantly for 150-300 and 0-300 ms delays, respectively. The task completion time increased significantly for 0-300 ms delays, while the suturing error rate increased significantly for 0-100 ms delays.</p><p><strong>Conclusion: </strong>These findings confirm that IPS activity can be used to quantify delay-induced operability changes. For delays beyond 150 ms, significant IPS changes indicated that operators perceived degraded control. However, for delays of or shorter than 150 ms, the operators' precision unconsciously declined, indicating that greater caution is required in surgical tasks.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144785942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Botao Yang, Chunming Li, Simone Fezzi, Zehao Fan, Runguo Wei, Yankai Chen, Domenico Tavella, Flavio L Ribichini, Su Zhang, Faisal Sharif, Shengxian Tu
{"title":"Temporal consistency-aware network for renal artery segmentation in X-ray angiography.","authors":"Botao Yang, Chunming Li, Simone Fezzi, Zehao Fan, Runguo Wei, Yankai Chen, Domenico Tavella, Flavio L Ribichini, Su Zhang, Faisal Sharif, Shengxian Tu","doi":"10.1007/s11548-025-03486-y","DOIUrl":"https://doi.org/10.1007/s11548-025-03486-y","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate segmentation of renal arteries from X-ray angiography videos is crucial for evaluating renal sympathetic denervation (RDN) procedures but remains challenging due to dynamic changes in contrast concentration and vessel morphology across frames. The purpose of this study is to propose TCA-Net, a deep learning model that improves segmentation consistency by leveraging local and global contextual information in angiography videos.</p><p><strong>Methods: </strong>Our approach utilizes a novel deep learning framework that incorporates two key modules: a local temporal window vessel enhancement module and a global vessel refinement module (GVR). The local module fuses multi-scale temporal-spatial features to improve the semantic representation of vessels in the current frame, while the GVR module integrates decoupled attention strategies (video-level and object-level attention) and gating mechanisms to refine global vessel information and eliminate redundancy. To further improve segmentation consistency, a temporal perception consistency loss function is introduced during training.</p><p><strong>Results: </strong>We evaluated our model using 195 renal artery angiography sequences for development and tested it on an external dataset from 44 patients. The results demonstrate that TCA-Net achieves an F1-score of 0.8678 for segmenting renal arteries, outperforming existing state-of-the-art segmentation methods.</p><p><strong>Conclusion: </strong>We present TCA-Net, a deep learning-based model that significantly improves segmentation consistency for renal artery angiography videos. By effectively leveraging both local and global temporal contextual information, TCA-Net outperforms current methods and provides a reliable tool for assessing RDN procedures.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144769280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicole Varble, Ming Li, Laetitia Saccenti, Tabea Borde, Antonio Arrichiello, Anna Christou, Katerina Lee, Lindsey Hazen, Sheng Xu, Riccardo Lencioni, Bradford J Wood
{"title":"Estimation of tumor coverage after RF ablation of hepatocellular carcinoma using single 2D image slices.","authors":"Nicole Varble, Ming Li, Laetitia Saccenti, Tabea Borde, Antonio Arrichiello, Anna Christou, Katerina Lee, Lindsey Hazen, Sheng Xu, Riccardo Lencioni, Bradford J Wood","doi":"10.1007/s11548-025-03423-z","DOIUrl":"10.1007/s11548-025-03423-z","url":null,"abstract":"<p><strong>Purpose: </strong>To assess the technical success of radiofrequency ablation (RFA) in patients with hepatocellular carcinoma (HCC), an artificial intelligence (AI) model was developed to estimate the tumor coverage without the need for segmentation or registration tools.</p><p><strong>Methods: </strong>A secondary retrospective analysis of 550 patients in the multicenter and multinational OPTIMA trial (3-7 cm solidary HCC lesions, randomized to RFA or RFA + LTLD) identified 182 patients with well-defined pre-RFA tumor and 1-month post-RFA devascularized ablation zones on enhanced CT. The ground-truth, or percent tumor coverage, was determined based on the result of semi-automatic 3D tumor and ablation zone segmentation and elastic registration. The isocenter of the tumor and ablation was isolated on 2D axial CT images. Feature extraction was performed, and classification and linear regression models were built. Images were augmented, and 728 image pairs were used for training and testing. The estimated percent tumor coverage using the models was compared to ground-truth. Validation was performed on eight patient cases from a separate institution, where RFA was performed, and pre- and post-ablation images were collected.</p><p><strong>Results: </strong>In testing cohorts, the best model accuracy was with classification and moderate data augmentation (AUC = 0.86, TPR = 0.59, and TNR = 0.89, accuracy = 69%) and regression with random forest (RMSE = 12.6%, MAE = 9.8%). Validation in a separate institution did not achieve accuracy greater than random estimation. Visual review of training cases suggests that poor tumor coverage may be a result of atypical ablation zone shrinkage 1 month post-RFA, which may not be reflected in clinical utilization.</p><p><strong>Conclusion: </strong>An AI model that uses 2D images at the center of the tumor and 1 month post-ablation can accurately estimate ablation tumor coverage. In separate validation cohorts, translation could be challenging.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1653-1663"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350484/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niklas Agethen, Janis Rosskamp, Tom L Koller, Jan Klein, Gabriel Zachmann
{"title":"Recurrent multi-view 6DoF pose estimation for marker-less surgical tool tracking.","authors":"Niklas Agethen, Janis Rosskamp, Tom L Koller, Jan Klein, Gabriel Zachmann","doi":"10.1007/s11548-025-03436-8","DOIUrl":"10.1007/s11548-025-03436-8","url":null,"abstract":"<p><strong>Purpose: </strong>Marker-based tracking of surgical instruments facilitates surgical navigation systems with high precision, but requires time-consuming preparation and is prone to stains or occluded markers. Deep learning promises marker-less tracking based solely on RGB videos to address these challenges. In this paper, object pose estimation is applied to surgical instrument tracking using a novel deep learning architecture.</p><p><strong>Methods: </strong>We combine pose estimation from multiple views with recurrent neural networks to better exploit temporal coherence for improved tracking. We also investigate the performance under conditions where the instrument is obscured. We enhance an existing pose (distribution) estimation pipeline by a spatio-temporal feature extractor that allows for feature incorporation along an entire sequence of frames.</p><p><strong>Results: </strong>On a synthetic dataset we achieve a mean tip error below 1.0 mm and an angle error below 0.2 <math><mmultiscripts><mrow></mrow> <mrow></mrow> <mo>∘</mo></mmultiscripts> </math> using a four-camera setup. On a real dataset with four cameras we achieve an error below 3.0 mm. Under limited instrument visibility our recurrent approach can predict the tip position approximately 3 mm more precisely than the non-recurrent approach.</p><p><strong>Conclusion: </strong>Our findings on a synthetic dataset of surgical instruments demonstrate that deep-learning-based tracking using multiple cameras simultaneously can be competitive with marker-based systems. Additionally, the temporal information obtained through the architecture's recurrent nature is advantageous when the instrument is occluded. The synthesis of multi-view and recurrence has thus been shown to enhance the reliability and usability of high-precision surgical pose estimation.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1589-1599"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144318641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BronchoGAN: anatomically consistent and domain-agnostic image-to-image translation for video bronchoscopy.","authors":"Ahmad Soliman, Ron Keuth, Marian Himstedt","doi":"10.1007/s11548-025-03450-w","DOIUrl":"10.1007/s11548-025-03450-w","url":null,"abstract":"<p><p>Purpose The limited availability of bronchoscopy images makes image synthesis particularly interesting for training deep learning models. Robust image translation across different domains-virtual bronchoscopy, phantom as well as in vivo and ex vivo image data-is pivotal for clinical applications. Methods This paper proposes BronchoGAN introducing anatomical constraints for image-to-image translation being integrated into a conditional GAN. In particular, we force bronchial orifices to match across input and output images. We further propose to use foundation model-generated depth images as intermediate representation ensuring robustness across a variety of input domains establishing models with substantially less reliance on individual training datasets. Moreover, our intermediate depth image representation allows to easily construct paired image data for training. Results Our experiments showed that input images from different domains (e.g., virtual bronchoscopy, phantoms) can be successfully translated to images mimicking realistic human airway appearance. We demonstrated that anatomical settings (i.e., bronchial orifices) can be robustly preserved with our approach which is shown qualitatively and quantitatively by means of improved FID, SSIM and dice coefficients scores. Our anatomical constraints enabled an improvement in the Dice coefficient of up to 0.43 for synthetic images. Conclusion Through foundation models for intermediate depth representations and bronchial orifice segmentation integrated as anatomical constraints into conditional GANs, we are able to robustly translate images from different bronchoscopy input domains. BronchoGAN allows to incorporate public CT scan data (virtual bronchoscopy) in order to generate large-scale bronchoscopy image datasets with realistic appearance. BronchoGAN enables to bridge the gap of missing public bronchoscopy images.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1741-1748"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144486837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Frisken, Vivek Gopalakrishnan, David Dimitris Chlorogiannis, Nazim Haouchine, Alexandre Cafaro, Alexandra J Golby, William M Wells Iii, Rose Du
{"title":"Spatiotemporally constrained 3D reconstruction from biplanar digital subtraction angiography.","authors":"Sarah Frisken, Vivek Gopalakrishnan, David Dimitris Chlorogiannis, Nazim Haouchine, Alexandre Cafaro, Alexandra J Golby, William M Wells Iii, Rose Du","doi":"10.1007/s11548-025-03427-9","DOIUrl":"10.1007/s11548-025-03427-9","url":null,"abstract":"<p><strong>Purpose: </strong>Our goal is to reconstruct 3D cerebral vessels from two 2D digital subtraction angiography (DSA) images acquired using a biplane scanner. This could provide intraoperative 3D imaging with 2-5 × spatial and 20 × temporal resolution of 3D magnetic resonance angiography, computed tomography angiography (CTA), or rotational DSA. Because many interventional radiology suites have biplane scanners, our method could be easily integrated into clinical workflows.</p><p><strong>Methods: </strong>We present a constrained 3D reconstruction method that utilizes vessel centerlines, radii, and the flow of contrast agent through vessels from DSA. The reconstructed volume samples 'vesselness' at each voxel, i.e., its probability of containing a vessel. We present evaluation metrics which we used to optimize reconstruction parameters and evaluate our method on synthetic data. We provide preliminary results on clinical data. To handle clinical data, we developed a software tool for extracting vessel centerlines, radii, and contrast arrival times from clinical DSA. We provide an automated method for registering DSA to CTA which allows us to compare reconstructed vessels with vessels extracted from CTA.</p><p><strong>Result: </strong>Our method reduced reconstruction artifacts in vesselness volumes for both synthetic and clinical data. In synthetic DSA, where 3D ground-truth vessel centerlines are available, our constrained reconstruction method improved accuracy, selectivity, and Dice scores with two views compared to existing sparse reconstruction methods with up to 16 views.</p><p><strong>Conclusion: </strong>Incorporating additional constraints into 3D reconstruction can successfully reduce artifacts introduced when a complex 3D structure like the brain vasculature is reconstructed from a small number of 2D views.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1689-1701"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350070/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144200762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ina Vernikouskaya, Volker Rasche, Jan Kassubek, Hans-Peter Müller
{"title":"Hypothalamus and intracranial volume segmentation at the group level by use of a Gradio-CNN framework.","authors":"Ina Vernikouskaya, Volker Rasche, Jan Kassubek, Hans-Peter Müller","doi":"10.1007/s11548-025-03438-6","DOIUrl":"10.1007/s11548-025-03438-6","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to develop and evaluate a graphical user interface (GUI) for the automated segmentation of the hypothalamus and intracranial volume (ICV) in brain MRI scans. The interface was designed to facilitate efficient and accurate segmentation for research applications, with a focus on accessibility and ease of use for end-users.</p><p><strong>Methods: </strong>We developed a web-based GUI using the Gradio library integrating deep learning-based segmentation models trained on annotated brain MRI scans. The model utilizes a U-Net architecture to delineate the hypothalamus and ICV. The GUI allows users to upload high-resolution MRI scans, visualize the segmentation results, calculate hypothalamic volume and ICV, and manually correct individual segmentation results. To ensure widespread accessibility, we deployed the interface using ngrok, allowing users to access the tool via a shared link. As an example for the universality of the approach, the tool was applied to a group of 90 patients with Parkinson's disease (PD) and 39 controls.</p><p><strong>Results: </strong>The GUI demonstrated high usability and efficiency in segmenting the hypothalamus and the ICV, with no significant difference in normalized hypothalamic volume observed between PD patients and controls, consistent with previously published findings. The average processing time per patient volume was 18 s for the hypothalamus and 44 s for the ICV segmentation on a 6 GB NVidia GeForce GTX 1060 GPU. The ngrok-based deployment allowed for seamless access across different devices and operating systems, with an average connection time of less than 5 s.</p><p><strong>Conclusion: </strong>The developed GUI provides a powerful and accessible tool for applications in neuroimaging. The combination of the intuitive interface, accurate deep learning-based segmentation, and easy deployment via ngrok addresses the need for user-friendly tools in brain MRI analysis. This approach has the potential to streamline workflows in neuroimaging research.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1723-1731"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350493/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model science and modelling informatics for a model identity card/certificate (MIC) in radiology and surgery.","authors":"Heinz U Lemke","doi":"10.1007/s11548-025-03431-z","DOIUrl":"10.1007/s11548-025-03431-z","url":null,"abstract":"<p><strong>Purpose: </strong>A model identity card/certificate (MIC) aims at enhancing trustworthiness in systems that purport to provide intelligent responses to human questions and actions. It should serve those individuals who have a professional need for or who want to feel more comfortable, when writing about or interacting with intelligent machines. The general and/or specific domain models on which recommendations, decisions or actions of these systems are based, reflect in their MIC the level of model relevance, truthfulness and transparency.</p><p><strong>Methods: </strong>In the specific context of CARS, methods and tools for building models and their corresponding templates for a MIC in the domains of radiology and surgery should be drawn from relevant elements of a model science, specifically from mathematical modelling methods (e.g. for model truthfulness) and modelling informatics tools (e.g. for model transparency). Modelling methods for radiology and surgery may be drawn from applied mathematics, mathematical logic and/or syntax graph based text. Examples of supporting tools from modelling informatics are UML, MIMMS, model-based software engineering or model-based medical evidence.</p><p><strong>Results: </strong>For a Model Guided Precision Medicine as defined by SPIE MI 2025, a precise protocol relating to the origins of these models need to be reflected in the corresponding MIC templates for specific medical domains, for example, in radiology or surgery. An example of a MIC template (work-in-progress) in the domain of orthopaedic surgery serves to demonstrate some aspects of model relevance, truthfulness and transparency.</p><p><strong>Conclusion: </strong>Gaining trustworthiness in intelligent systems based on models and related AI tools is a challenging undertaking and raises many critical questions, specifically those related to ascertain model relevance, truthfulness and transparency. The healthcare system, in particular, will have to be concerned about the availability of digital identity certificates for these systems and related artefacts, e.g. digital twins, avatars, robots, intelligent agents, etc. Further development of the elements of a model science with emphasis on modelling informatics may be the right path to take, preferably in cooperation with international R&D groups interested in the realisation of an MGM and corresponding MICs.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1561-1566"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144136417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nevin M Matasyoh, Daniel Delev, Waseem Masalha, Franziska Mathis-Ullrich, Ramy A Zeineldin
{"title":"NeuroLens: organ localization using natural language commands for anatomical recognition in surgical training.","authors":"Nevin M Matasyoh, Daniel Delev, Waseem Masalha, Franziska Mathis-Ullrich, Ramy A Zeineldin","doi":"10.1007/s11548-025-03463-5","DOIUrl":"10.1007/s11548-025-03463-5","url":null,"abstract":"<p><strong>Purpose: </strong>This study introduces NeuroLens, a multimodal system designed to enhance anatomical recognition by integrating video with textual and voice inputs. It aims to provide an interactive learning platform for surgical trainees.</p><p><strong>Methods: </strong>NeuroLens employs a multimodal deep learning localization model trained on an Endoscopic Third Ventriculostomy dataset. It processes neuroendoscopic videos with textual or voice descriptions to identify and localize anatomical structures, displaying them as labeled bounding boxes. Usability was evaluated through a questionnaire by five participants, including surgical students and practicing surgeons. The questionnaire included both quantitative and qualitative sections. The quantitative part covered the System Usability Scale (SUS) and assessments of system appearance, functionality, and overall usability, while the qualitative section gathered user feedback and improvement suggestions. The localization model's performance was assessed using accuracy and mean Intersection over Union (mIoU) metrics.</p><p><strong>Results: </strong>The system demonstrates strong usability, with an average SUS score of 71.5, exceeding the threshold for acceptable usability. The localization achieves a predicted class accuracy of 100%, a localization accuracy of 79.69%, and a mIoU of 67.10%. Participant feedback highlights the intuitive design, organization, and responsiveness while suggesting enhancements like 3D visualization.</p><p><strong>Conclusion: </strong>NeuroLens integrates multimodal inputs for accurate anatomical detection and localization, addressing limitations of traditional training. Its strong usability and technical performance make it a valuable tool for enhancing anatomical learning in surgical training. While NeuroLens shows strong usability and performance, its small sample size limits generalizability. Further evaluation with more students and enhancements like 3D visualization will strengthen its effectiveness.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1623-1632"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144486853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serouj Khajarian, Michael Schwimmbeck, Konstantin Holzapfel, Johannes Schmidt, Christopher Auer, Stefanie Remmele, Oliver Amft
{"title":"Automated multimodel segmentation and tracking for AR-guided open liver surgery using scene-aware self-prompting.","authors":"Serouj Khajarian, Michael Schwimmbeck, Konstantin Holzapfel, Johannes Schmidt, Christopher Auer, Stefanie Remmele, Oliver Amft","doi":"10.1007/s11548-025-03381-6","DOIUrl":"10.1007/s11548-025-03381-6","url":null,"abstract":"<p><strong>Purpose: </strong>We introduce a multimodel, real-time semantic segmentation and tracking approach for Augmented Reality (AR)-guided open liver surgery. Our approach leverages foundation models and scene-aware re-prompting strategies to balance segmentation accuracy and inference time as required for real-time AR-assisted surgery applications.</p><p><strong>Methods: </strong>Our approach integrates a domain-specific RGBD model (ESANet), a foundation model for semantic segmentation (SAM), and a semi-supervised video object segmentation model (DeAOT). Models were combined in an auto-promptable pipeline with a scene-aware re-prompting algorithm that adapts to surgical scene changes. We evaluated our approach on intraoperative RGBD videos from 10 open liver surgeries using a head-mounted AR device. Segmentation accuracy (IoU), temporal resolution (FPS), and the impact of re-prompting strategies were analyzed. Comparisons to individual models were performed.</p><p><strong>Results: </strong>Our multimodel approach achieved a median IoU of 71% at 13.2 FPS without re-prompting. Performance of our multimodel approach surpasses that of individual models, yielding better segmentation accuracy than ESANet and better temporal resolution compared to SAM. Our scene-aware re-prompting method reaches the DeAOT performance, with an IoU of 74.7% at 11.5 FPS, even when the DeAOT model uses an ideal reference frame.</p><p><strong>Conclusion: </strong>Our scene-aware re-prompting strategy provides a trade-off between segmentation accuracy and temporal resolution, thus addressing the requirements of real-time AR-guided open liver surgery. The integration of complementary models resulted in robust and accurate segmentation in a complex, real-world surgical settings.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":"1613-1621"},"PeriodicalIF":2.3,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350598/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144042828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}