Daniel Caballero, Manuel J Pérez-Salazar, Juan A Sánchez-Margallo, Francisco M Sánchez-Margallo
{"title":"Applying artificial intelligence on EDA sensor data to predict stress on minimally invasive robotic-assisted surgery.","authors":"Daniel Caballero, Manuel J Pérez-Salazar, Juan A Sánchez-Margallo, Francisco M Sánchez-Margallo","doi":"10.1007/s11548-024-03218-8","DOIUrl":"10.1007/s11548-024-03218-8","url":null,"abstract":"<p><strong>Purpose: </strong>This study aims predicting the stress level based on the ergonomic (kinematic) and physiological (electrodermal activity-EDA, blood pressure and body temperature) parameters of the surgeon from their records collected in the previously immediate situation of a minimally invasive robotic surgery activity.</p><p><strong>Methods: </strong>For this purpose, data related to the surgeon's ergonomic and physiological parameters were collected during twenty-six robotic-assisted surgical sessions completed by eleven surgeons with different experience levels. Once the dataset was generated, two preprocessing techniques were applied (scaled and normalized), these two datasets were divided into two subsets: with 80% of data for training and cross-validation, and 20% of data for test. Three predictive techniques (multiple linear regression-MLR, support vector machine-SVM and multilayer perceptron-MLP) were applied on training dataset to generate predictive models. Finally, these models were validated on cross-validation and test datasets. After each session, surgeons were asked to complete a survey of their feeling of stress. These data were compared with those obtained using predictive models.</p><p><strong>Results: </strong>The results showed that MLR combined with the scaled preprocessing achieved the highest R<sup>2</sup> coefficient and the lowest error for each parameter analyzed. Additionally, the results for the surgeons' surveys were highly correlated to the results obtained by the predictive models (R<sup>2</sup> = 0.8253).</p><p><strong>Conclusions: </strong>The linear models proposed in this study were successfully validated on cross-validation and test datasets. This fact demonstrates the possibility of predicting factors that help us to improve the surgeon's health during robotic surgery.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning-based segmentation of left ventricular myocardium on dynamic contrast-enhanced MRI: a comprehensive evaluation across temporal frames.","authors":"Raufiya Jafari, Radhakrishan Verma, Vinayak Aggarwal, Rakesh Kumar Gupta, Anup Singh","doi":"10.1007/s11548-024-03221-z","DOIUrl":"10.1007/s11548-024-03221-z","url":null,"abstract":"<p><strong>Purpose: </strong>Cardiac perfusion MRI is vital for disease diagnosis, treatment planning, and risk stratification, with anomalies serving as markers of underlying ischemic pathologies. AI-assisted methods and tools enable accurate and efficient left ventricular (LV) myocardium segmentation on all DCE-MRI timeframes, offering a solution to the challenges posed by the multidimensional nature of the data. This study aims to develop and assess an automated method for LV myocardial segmentation on DCE-MRI data of a local hospital.</p><p><strong>Methods: </strong>The study consists of retrospective DCE-MRI data from 55 subjects acquired at the local hospital using a 1.5 T MRI scanner. The dataset included subjects with and without cardiac abnormalities. The timepoint for the reference frame (post-contrast LV myocardium) was identified using standard deviation across the temporal sequences. Iterative image registration of other temporal images with respect to this reference image was performed using Maxwell's demons algorithm. The registered stack was fed to the model built using the U-Net framework for predicting the LV myocardium at all timeframes of DCE-MRI.</p><p><strong>Results: </strong>The mean and standard deviation of the dice similarity coefficient (DSC) for myocardial segmentation using pre-trained network Net_cine is 0.78 ± 0.04, and for the fine-tuned network Net_dyn which predicts mask on all timeframes individually, it is 0.78 ± 0.03. The DSC for Net_dyn ranged from 0.71 to 0.93. The average DSC achieved for the reference frame is 0.82 ± 0.06.</p><p><strong>Conclusion: </strong>The study proposed a fast and fully automated AI-assisted method to segment LV myocardium on all timeframes of DCE-MRI data. The method is robust, and its performance is independent of the intra-temporal sequence registration and can easily accommodate timeframes with potential registration errors.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141535937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Trishia El Chemaly, Caio Athayde Neves, Fanrui Fu, Brian Hargreaves, Nikolas H Blevins
{"title":"From microscope to head-mounted display: integrating hand tracking into microsurgical augmented reality.","authors":"Trishia El Chemaly, Caio Athayde Neves, Fanrui Fu, Brian Hargreaves, Nikolas H Blevins","doi":"10.1007/s11548-024-03224-w","DOIUrl":"10.1007/s11548-024-03224-w","url":null,"abstract":"<p><strong>Purpose: </strong>The operating microscope plays a central role in middle and inner ear procedures that involve working within tightly confined spaces under limited exposure. Augmented reality (AR) may improve surgical guidance by combining preoperative computed tomography (CT) imaging that can provide precise anatomical information, with intraoperative microscope video feed. With current technology, the operator must manually interact with the AR interface using a computer. The latter poses a disruption in the surgical flow and is suboptimal for maintaining the sterility of the operating environment. The purpose of this study was to implement and evaluate free-hand interaction concepts leveraging hand tracking and gesture recognition as an attempt to reduce the disruption during surgery and improve human-computer interaction.</p><p><strong>Methods: </strong>An electromagnetically tracked surgical microscope was calibrated using a custom 3D printed calibration board. This allowed the augmentation of the microscope feed with segmented preoperative CT-derived virtual models. Ultraleap's Leap Motion Controller 2 was coupled to the microscope and used to implement hand-tracking capabilities. End-user feedback was gathered from a surgeon during development. Finally, users were asked to complete tasks that involved interacting with the virtual models, aligning them to physical targets, and adjusting the AR visualization.</p><p><strong>Results: </strong>Following observations and user feedback, we upgraded the functionalities of the hand interaction system. User feedback showed the users' preference for the new interaction concepts that provided minimal disruption of the surgical workflow and more intuitive interaction with the virtual content.</p><p><strong>Conclusion: </strong>We integrated hand interaction concepts, typically used with head-mounted displays (HMDs), into a surgical stereo microscope system intended for AR in otologic microsurgery. The concepts presented in this study demonstrated a more favorable approach to human-computer interaction in a surgical context. They hold potential for a more efficient execution of surgical tasks under microscopic AR guidance.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142005804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A position-enhanced sequential feature encoding model for lung infections and lymphoma classification on CT images.","authors":"Rui Zhao, Wenhao Li, Xilai Chen, Yuchong Li, Baochun He, Yucong Zhang, Yu Deng, Chunyan Wang, Fucang Jia","doi":"10.1007/s11548-024-03230-y","DOIUrl":"10.1007/s11548-024-03230-y","url":null,"abstract":"<p><strong>Purpose: </strong>Differentiating pulmonary lymphoma from lung infections using CT images is challenging. Existing deep neural network-based lung CT classification models rely on 2D slices, lacking comprehensive information and requiring manual selection. 3D models that involve chunking compromise image information and struggle with parameter reduction, limiting performance. These limitations must be addressed to improve accuracy and practicality.</p><p><strong>Methods: </strong>We propose a transformer sequential feature encoding structure to integrate multi-level information from complete CT images, inspired by the clinical practice of using a sequence of cross-sectional slices for diagnosis. We incorporate position encoding and cross-level long-range information fusion modules into the feature extraction CNN network for cross-sectional slices, ensuring high-precision feature extraction.</p><p><strong>Results: </strong>We conducted comprehensive experiments on a dataset of 124 patients, with respective sizes of 64, 20 and 40 for training, validation and testing. The results of ablation experiments and comparative experiments demonstrated the effectiveness of our approach. Our method outperforms existing state-of-the-art methods in the 3D CT image classification problem of distinguishing between lung infections and pulmonary lymphoma, achieving an accuracy of 0.875, AUC of 0.953 and F1 score of 0.889.</p><p><strong>Conclusion: </strong>The experiments verified that our proposed position-enhanced transformer-based sequential feature encoding model is capable of effectively performing high-precision feature extraction and contextual feature fusion in the lungs. It enhances the ability of a standalone CNN network or transformer to extract features, thereby improving the classification performance. The source code is accessible at https://github.com/imchuyu/PTSFE .</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141604436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas Osburg, Alexandra Scheibert, Marco Horn, Ravn Pater, Floris Ernst
{"title":"Automatic robotic doppler sonography of leg arteries.","authors":"Jonas Osburg, Alexandra Scheibert, Marco Horn, Ravn Pater, Floris Ernst","doi":"10.1007/s11548-024-03235-7","DOIUrl":"10.1007/s11548-024-03235-7","url":null,"abstract":"<p><strong>Purpose: </strong>Robot-assisted systems offer an opportunity to support the diagnostic and therapeutic treatment of vascular diseases to reduce radiation exposure and support the limited medical staff in vascular medicine. In the diagnosis and follow-up care of vascular pathologies, Doppler ultrasound has become the preferred diagnostic tool. The study presents a robotic system for automatic Doppler ultrasound examinations of patients' leg vessels.</p><p><strong>Methods: </strong>The robotic system consists of a redundant 7 DoF serial manipulator, to which a 3D ultrasound probe is attached. A compliant control was employed, whereby the transducer was guided along the vessel with a defined contact force. Visual servoing was used to correct the position of the probe during the scan so that the vessel can always be properly visualized. To track the vessel's position, methods based on template matching and Doppler sonography were used.</p><p><strong>Results: </strong>Our system was able to successfully scan the femoral artery of seven volunteers automatically for a distance of 20 cm. In particular, our approach using Doppler ultrasound data showed high robustness and an accuracy of 10.7 (±3.1) px in determining the vessel's position and thus outperformed our template matching approach, whereby an accuracy of 13.9 (±6.4) px was achieved.</p><p><strong>Conclusions: </strong>The developed system enables automated robotic ultrasound examinations of vessels and thus represents an opportunity to reduce radiation exposure and staff workload. The integration of Doppler ultrasound improves the accuracy and robustness of vessel tracking, and could thus contribute to the realization of routine robotic vascular examinations and potential endovascular interventions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11442516/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141762450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Selene De Sutter, Joris Wuts, Wietse Geens, Anne-Marie Vanbinst, Johnny Duerinck, Jef Vandemeulebroucke
{"title":"Modality redundancy for MRI-based glioblastoma segmentation.","authors":"Selene De Sutter, Joris Wuts, Wietse Geens, Anne-Marie Vanbinst, Johnny Duerinck, Jef Vandemeulebroucke","doi":"10.1007/s11548-024-03238-4","DOIUrl":"10.1007/s11548-024-03238-4","url":null,"abstract":"<p><strong>Purpose: </strong>Automated glioblastoma segmentation from magnetic resonance imaging is generally performed on a four-modality input, including T1, contrast T1, T2 and FLAIR. We hypothesize that information redundancy is present within these image combinations, which can possibly reduce a model's performance. Moreover, for clinical applications, the risk of encountering missing data rises as the number of required input modalities increases. Therefore, this study aimed to explore the relevance and influence of the different modalities used for MRI-based glioblastoma segmentation.</p><p><strong>Methods: </strong>After the training of multiple segmentation models based on nnU-Net and SwinUNETR architectures, differing only in their amount and combinations of input modalities, each model was evaluated with regard to segmentation accuracy and epistemic uncertainty.</p><p><strong>Results: </strong>Results show that T1CE-based segmentation (for enhanced tumor and tumor core) and T1CE-FLAIR-based segmentation (for whole tumor and overall segmentation) can reach segmentation accuracies comparable to the full-input version. Notably, the highest segmentation accuracy for nnU-Net was found for a three-input configuration of T1CE-FLAIR-T1, suggesting the confounding effect of redundant input modalities. The SwinUNETR architecture appears to suffer less from this, where said three-input and the full-input model yielded statistically equal results.</p><p><strong>Conclusion: </strong>The T1CE-FLAIR-based model can therefore be considered as a minimal-input alternative to the full-input configuration. Addition of modalities beyond this does not statistically improve and can even deteriorate accuracy, but does lower the segmentation uncertainty.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11442599/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141876609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philipp Aebischer, Lukas Anschuetz, Marco Caversaccio, Georgios Mantokoudis, Stefan Weder
{"title":"Quantitative in-vitro assessment of a novel robot-assisted system for cochlear implant electrode insertion.","authors":"Philipp Aebischer, Lukas Anschuetz, Marco Caversaccio, Georgios Mantokoudis, Stefan Weder","doi":"10.1007/s11548-024-03276-y","DOIUrl":"https://doi.org/10.1007/s11548-024-03276-y","url":null,"abstract":"<p><strong>Purpose: </strong>As an increasing number of cochlear implant candidates exhibit residual inner ear function, hearing preservation strategies during implant insertion are gaining importance. Manual implantation is known to induce traumatic force and pressure peaks. In this study, we use a validated in-vitro model to comprehensively evaluate a novel surgical tool that addresses these challenges through motorized movement of a forceps.</p><p><strong>Methods: </strong>Using lateral wall electrodes, we examined two subgroups of insertions: 30 insertions were performed manually by experienced surgeons, and another 30 insertions were conducted with a robot-assisted system under the same surgeons' supervision. We utilized a realistic, validated model of the temporal bone. This model accurately reproduces intracochlear frictional conditions and allows for the synchronous recording of forces on intracochlear structures, intracochlear pressure, and the position and deformation of the electrode array within the scala tympani.</p><p><strong>Results: </strong>We identified a significant reduction in force variation during robot-assisted insertions compared to the conventional procedure, with average values of 12 mN/s and 32 mN/s, respectively. Robotic assistance was also associated with a significant reduction of strong pressure peaks and a 17 dB reduction in intracochlear pressure levels. Furthermore, our study highlights that the release of the insertion tool represents a critical phase requiring surgical training.</p><p><strong>Conclusion: </strong>Robotic assistance demonstrated more consistent insertion speeds compared to manual techniques. Its use can significantly reduce factors associated with intracochlear trauma, highlighting its potential for improved hearing preservation. Finally, the system does not mitigate the impact of subsequent surgical steps like electrode cable routing and cochlear access sealing, pointing to areas in need of further research.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142332055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ladies and Gentlemen! This is no humbug. Why Model-Guided Medicine will become a main pillar for the future healthcare system.","authors":"Mario A Cypko, Dirk Wilhelm","doi":"10.1007/s11548-024-03269-x","DOIUrl":"10.1007/s11548-024-03269-x","url":null,"abstract":"<p><strong>Purpose: </strong>Model-Guided Medicine (MGM) is a transformative approach to health care that offers a comprehensive and integrative perspective that goes far beyond our current concepts. In this editorial, we want to take a closer look at this innovative concept and how health care could benefit from its further development and application.</p><p><strong>Methods: </strong>The information presented here is primarily the opinion of the authors and is based on their knowledge in the fields of information technology, computer science, and medicine. The contents are also the result of numerous discussions and scientific meetings within the CARS Society and the CARS Workshop on Model-Guided Medicine and are substantially stimulated by the available literature on the subject.</p><p><strong>Results: </strong>The current healthcare landscape, with its reliance on isolated data points and broad population-based recommendations, often fails to integrate the dynamic and patient-specific factors necessary for truly personalised care. MGM addresses these limitations by integrating recent advancements in data processing, artificial intelligence, and human-computer interaction for the creation of individual models which integrate the available information and knowledge of patients, healthcare providers, devices, environment, etc. Based on a holistic concept, MGM will become effective tool for modern medicine, which shows a unique ability to assess and analyse interconnected relations and the combined impact of multiple factors on the individual. MGM emphasises transparency, reproducibility, and interpretability, ensuring that models are not black boxes but tools that healthcare professionals can fully understand, validate, and apply in clinical practice.</p><p><strong>Conclusion: </strong>The practical applications of MGM are vast, ranging from optimising individual treatment plans to enhancing the efficiency of entire healthcare systems. The research community is called upon to pioneer new projects that demonstrate MGM's potential, establishing it as a central pillar of future health care, where more personalised, predictive, and effective medical practices will hopefully become the standard.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robotic navigation with deep reinforcement learning in transthoracic echocardiography.","authors":"Yuuki Shida, Souto Kumagai, Hiroyasu Iwata","doi":"10.1007/s11548-024-03275-z","DOIUrl":"https://doi.org/10.1007/s11548-024-03275-z","url":null,"abstract":"<p><strong>Purpose: </strong>The search for heart components in robotic transthoracic echocardiography is a time-consuming process. This paper proposes an optimized robotic navigation system for heart components using deep reinforcement learning to achieve an efficient and effective search technique for heart components.</p><p><strong>Method: </strong>The proposed method introduces (i) an optimized search behavior generation algorithm that avoids multiple local solutions and searches for the optimal solution and (ii) an optimized path generation algorithm that minimizes the search path, thereby realizing short search times.</p><p><strong>Results: </strong>The mitral valve search with the proposed method reaches the optimal solution with a probability of 74.4%, the mitral valve confidence loss rate when the local solution stops is 16.3% on average, and the inspection time with the generated path is 48.6 s on average, which is 56.6% of the time cost of the conventional method.</p><p><strong>Conclusion: </strong>The results indicate that the proposed method improves the search efficiency, and the optimal location can be searched in many cases with the proposed method, and the loss rate of the confidence in the mitral valve was low even when a local solution rather than the optimal solution was reached. It is suggested that the proposed method enables accurate and quick robotic navigation to find heart components.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":2.3,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142300392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingxing Rao, Yinhong Qin, Soheil Kolouri, Jie Ying Wu, Daniel Moyer
{"title":"Zero-shot prompt-based video encoder for surgical gesture recognition","authors":"Mingxing Rao, Yinhong Qin, Soheil Kolouri, Jie Ying Wu, Daniel Moyer","doi":"10.1007/s11548-024-03257-1","DOIUrl":"https://doi.org/10.1007/s11548-024-03257-1","url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Purpose</h3><p>In order to produce a surgical gesture recognition system that can support a wide variety of procedures, either a very large annotated dataset must be acquired, or fitted models must generalize to new labels (so-called zero-shot capability). In this paper we investigate the feasibility of latter option.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Leveraging the bridge-prompt framework, we prompt-tune a pre-trained vision-text model (CLIP) for gesture recognition in surgical videos. This can utilize extensive outside video data such as text, but also make use of label meta-data and weakly supervised contrastive losses.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Our experiments show that prompt-based video encoder outperforms standard encoders in surgical gesture recognition tasks. Notably, it displays strong performance in zero-shot scenarios, where gestures/tasks that were not provided during the encoder training phase are included in the prediction phase. Additionally, we measure the benefit of inclusion text descriptions in the feature extractor training schema.</p><h3 data-test=\"abstract-sub-heading\">Conclusion</h3><p>Bridge-prompt and similar pre-trained + prompt-tuned video encoder models present significant visual representation for surgical robotics, especially in gesture recognition tasks. Given the diverse range of surgical tasks (gestures), the ability of these models to zero-shot transfer without the need for any task (gesture) specific retraining makes them invaluable.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}