Raphaela Maerkl, Tobias Rueckert, David Rauber, Max Gutbrod, Danilo Weber Nunes, Christoph Palm
{"title":"Enhancing generalization in zero-shot multi-label endoscopic instrument classification.","authors":"Raphaela Maerkl, Tobias Rueckert, David Rauber, Max Gutbrod, Danilo Weber Nunes, Christoph Palm","doi":"10.1007/s11548-025-03439-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03439-5","url":null,"abstract":"<p><strong>Purpose: </strong>Recognizing previously unseen classes with neural networks is a significant challenge due to their limited generalization capabilities. This issue is particularly critical in safety-critical domains such as medical applications, where accurate classification is essential for reliability and patient safety. Zero-shot learning methods address this challenge by utilizing additional semantic data, with their performance relying heavily on the quality of the generated embeddings.</p><p><strong>Methods: </strong>This work investigates the use of full descriptive sentences, generated by a Sentence-BERT model, as class representations, compared to simpler category-based word embeddings derived from a BERT model. Additionally, the impact of z-score normalization as a post-processing step on these embeddings is explored. The proposed approach is evaluated on a multi-label generalized zero-shot learning task, focusing on the recognition of surgical instruments in endoscopic images from minimally invasive cholecystectomies.</p><p><strong>Results: </strong>The results demonstrate that combining sentence embeddings and z-score normalization significantly improves model performance. For unseen classes, the AUROC improves from 43.9 % to 64.9 %, and the multi-label accuracy from 26.1 % to 79.5 %. Overall performance measured across both seen and unseen classes improves from 49.3 % to 64.9 % in AUROC and from 37.3 % to 65.1 % in multi-label accuracy, highlighting the effectiveness of our approach.</p><p><strong>Conclusion: </strong>These findings demonstrate that sentence embeddings and z-score normalization can substantially enhance the generalization performance of zero-shot learning models. However, as the study is based on a single dataset, future work should validate the method across diverse datasets and application domains to establish its robustness and broader applicability.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144267864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Two-stream MeshCNN for key anatomical segmentation on the liver surface.","authors":"Xukun Zhang, Sharib Ali, Minghao Han, Yanlan Kang, Xiaoying Wang, Lihua Zhang","doi":"10.1007/s11548-025-03358-5","DOIUrl":"https://doi.org/10.1007/s11548-025-03358-5","url":null,"abstract":"<p><strong>Purpose: </strong>Accurate preoperative segmentation of key anatomical regions on the liver surface is essential for enabling intraoperative navigation and position monitoring. However, current automatic segmentation methods face challenges due to the liver's drastic shape variations and limited data availability. This study aims to develop a two-stream mesh convolutional network (TSMCN) that integrates both global geometric and local topological information to achieve accurate, automatic segmentation of key anatomical regions.</p><p><strong>Methods: </strong>We propose TSMCN, which consists of two parallel streams: the E-stream focuses on extracting topological information from liver mesh edges, while the P-stream captures spatial relationships from coordinate points. These single-perspective features are adaptively fused through a fine-grained aggregation (FGA)-based attention mechanism, generating a robust pooled mesh that preserves task-relevant edges and topological structures. This fusion enhances the model's understanding of the liver mesh and facilitates discriminative feature extraction on the newly pooled mesh.</p><p><strong>Results: </strong>TSMCN was evaluated on 200 manually annotated 3D liver mesh datasets. It outperformed point-based (PointNet++) and edge feature-based (MeshCNN) methods, achieving superior segmentation results on the liver ridge and falciform ligament. The model significantly reduced the 3D Chamfer distance compared to other methods, with particularly strong performance in falciform ligament segmentation.</p><p><strong>Conclusion: </strong>TSMCN provides an effective approach to liver surface segmentation by integrating complementary geometric features. Its superior performance highlights the potential to enhance AR-guided liver surgery through automatic and precise preoperative segmentation of critical anatomical regions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144267866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonia Laguna, Lin Zhang, Can Deniz Bezek, Monika Farkas, Dieter Schweizer, Rahel A Kubik-Huch, Orcun Goksel
{"title":"Uncertainty estimation for trust attribution to speed-of-sound reconstruction with variational networks.","authors":"Sonia Laguna, Lin Zhang, Can Deniz Bezek, Monika Farkas, Dieter Schweizer, Rahel A Kubik-Huch, Orcun Goksel","doi":"10.1007/s11548-025-03402-4","DOIUrl":"https://doi.org/10.1007/s11548-025-03402-4","url":null,"abstract":"<p><strong>Purpose: </strong>Speed-of-sound (SoS) is a biomechanical characteristic of tissue, and its imaging can provide a promising biomarker for diagnosis. Reconstructing SoS images from ultrasound acquisitions can be cast as a limited-angle computed-tomography problem, with variational networks being a promising model-based deep learning solution. Some acquired data frames may, however, get corrupted by noise due to, e.g., motion, lack of contact, and acoustic shadows, which in turn negatively affects the resulting SoS reconstructions.</p><p><strong>Methods: </strong>We propose to use the uncertainty in SoS reconstructions to attribute trust to each individual acquired frame. Given multiple acquisitions, we then use an uncertainty-based automatic selection among these retrospectively, to improve diagnostic decisions. We investigate uncertainty estimation based on Monte Carlo Dropout and Bayesian Variational Inference.</p><p><strong>Results: </strong>We assess our automatic frame selection method for differential diagnosis of breast cancer, distinguishing between benign fibroadenoma and malignant carcinoma. We evaluate 21 lesions classified as BI-RADS 4, which represents suspicious cases for probable malignancy. The most trustworthy frame among four acquisitions of each lesion was identified using uncertainty-based criteria. Selecting a frame informed by uncertainty achieved an area under curve of 76% and 80% for Monte Carlo Dropout and Bayesian Variational Inference, respectively, superior to any uncertainty-uninformed baselines with the best one achieving 64%.</p><p><strong>Conclusion: </strong>A novel use of uncertainty estimation is proposed for selecting one of multiple data acquisitions for further processing and decision making.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144267867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyun Liu, Changyan He, Mulan Wu, Ann Ping, Anna Zavodni, Naomi Matsuura, Eric Diller
{"title":"Transformer-based robotic ultrasound 3D tracking for capsule robot in GI tract.","authors":"Xiaoyun Liu, Changyan He, Mulan Wu, Ann Ping, Anna Zavodni, Naomi Matsuura, Eric Diller","doi":"10.1007/s11548-025-03445-7","DOIUrl":"https://doi.org/10.1007/s11548-025-03445-7","url":null,"abstract":"<p><strong>Purpose: </strong>Ultrasound (US) imaging is a promising modality for real-time monitoring of robotic capsule endoscopes navigating through the gastrointestinal (GI) tract. It offers high temporal resolution and safety but is limited by a narrow field of view, low visibility in gas-filled regions and challenges in detecting out-of-plane motions. This work addresses these issues by proposing a novel robotic ultrasound tracking system capable of long-distance 3D tracking and active re-localization when the capsule is lost due to motion or artifacts.</p><p><strong>Methods: </strong>We develop a hybrid deep learning-based tracking framework combining convolutional neural networks (CNNs) and a transformer backbone. The CNN component efficiently encodes spatial features, while the transformer captures long-range contextual dependencies in B-mode US images. This model is integrated with a robotic arm that adaptively scans and tracks the capsule. The system's performance is evaluated using ex vivo colon phantoms under varying imaging conditions, with physical perturbations introduced to simulate realistic clinical scenarios.</p><p><strong>Results: </strong>The proposed system achieved continuous 3D tracking over distances exceeding 90 cm, with a mean centroid localization error of 1.5 mm and over 90% detection accuracy. We demonstrated 3D tracking in a more complex workspace featuring two curved sections to simulate anatomical challenges. This suggests the strong resilience of the tracking system to motion-induced artifacts and geometric variability. The system maintained real-time tracking at 9-12 FPS and successfully re-localized the capsule within seconds after tracking loss, even under gas artifacts and acoustic shadowing.</p><p><strong>Conclusion: </strong>This study presents a hybrid CNN-transformer system for automatic, real-time 3D ultrasound tracking of capsule robots over long distances. The method reliably handles occlusions, view loss and image artifacts, offering millimeter-level tracking accuracy. It significantly reduces clinical workload through autonomous detection and re-localization. Future work includes improving probe-tissue interaction handling and validating performance in live animal and human trials to assess physiological impacts.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144259315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sue Min Cho, Alexander Do, Robert Grupp, Mehran Armand, Russell Taylor, Mathias Unberath
{"title":"Uncertainty Quantification in Image-based 2D/3D Registration and Its Relationship with Accuracy.","authors":"Sue Min Cho, Alexander Do, Robert Grupp, Mehran Armand, Russell Taylor, Mathias Unberath","doi":"10.1007/s11548-025-03417-x","DOIUrl":"https://doi.org/10.1007/s11548-025-03417-x","url":null,"abstract":"<p><strong>Purpose: </strong>Reliable and accurate 2D/3D registration is essential for image-guided navigation and surgical robotics, enabling precise spatial alignment. This work investigates uncertainty quantification and characterization, addressing challenges specific to 2D/3D registration. Despite a few degrees of freedom (DoF), uncertainty in 2D/3D registration is difficult to estimate and interpret since it lacks the dimensional consistency in 2D/2D or 3D/3D registration.</p><p><strong>Methods: </strong>We model 2D/3D registration as a Maximum A Posteriori (MAP) estimation over the posterior distribution of 3D object poses given 2D fluoroscopic images. Uncertainty is quantified by sampling from an approximate posterior distribution, derived from a similarity function-based likelihood and a prior over the 6DoF pose space, and computing summary statistics and entropy measures from these samples. To characterize this approach, we generate plausible 2D/3D pelvis registrations and conduct experiments to investigate the relationship between uncertainty metrics and registration error.</p><p><strong>Results: </strong>Ordinary least squares (OLS) regression, a linear model, failed to capture the relationship between uncertainty metrics and registration error (R-squared = 0.023), while XGBoost provided a significantly better fit (R-squared = 0.85). A paired t-test revealed significant differences in prediction accuracy across registration error groups. XGBoost, fit on registrations closer to the correct solution, showed stronger predictive accuracy than the \"global\" model, which included the full range of errors, and the importance of uncertainty metrics differed between the two models.</p><p><strong>Conclusion: </strong>This work presents a novel method for uncertainty quantification and characterization in single-view 2D/3D registration. Our results reveal a nonlinear relationship between uncertainty and registration accuracy, with stronger correlations observed in low-error regimes. These insights offer a foundation for better understanding and improving registration reliability in image-guided interventions.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicole Varble, Ming Li, Laetitia Saccenti, Tabea Borde, Antonio Arrichiello, Anna Christou, Katerina Lee, Lindsey Hazen, Sheng Xu, Riccardo Lencioni, Bradford J Wood
{"title":"Estimation of tumor coverage after RF ablation of hepatocellular carcinoma using single 2D image slices.","authors":"Nicole Varble, Ming Li, Laetitia Saccenti, Tabea Borde, Antonio Arrichiello, Anna Christou, Katerina Lee, Lindsey Hazen, Sheng Xu, Riccardo Lencioni, Bradford J Wood","doi":"10.1007/s11548-025-03423-z","DOIUrl":"https://doi.org/10.1007/s11548-025-03423-z","url":null,"abstract":"<p><strong>Purpose: </strong>To assess the technical success of radiofrequency ablation (RFA) in patients with hepatocellular carcinoma (HCC), an artificial intelligence (AI) model was developed to estimate the tumor coverage without the need for segmentation or registration tools.</p><p><strong>Methods: </strong>A secondary retrospective analysis of 550 patients in the multicenter and multinational OPTIMA trial (3-7 cm solidary HCC lesions, randomized to RFA or RFA + LTLD) identified 182 patients with well-defined pre-RFA tumor and 1-month post-RFA devascularized ablation zones on enhanced CT. The ground-truth, or percent tumor coverage, was determined based on the result of semi-automatic 3D tumor and ablation zone segmentation and elastic registration. The isocenter of the tumor and ablation was isolated on 2D axial CT images. Feature extraction was performed, and classification and linear regression models were built. Images were augmented, and 728 image pairs were used for training and testing. The estimated percent tumor coverage using the models was compared to ground-truth. Validation was performed on eight patient cases from a separate institution, where RFA was performed, and pre- and post-ablation images were collected.</p><p><strong>Results: </strong>In testing cohorts, the best model accuracy was with classification and moderate data augmentation (AUC = 0.86, TPR = 0.59, and TNR = 0.89, accuracy = 69%) and regression with random forest (RMSE = 12.6%, MAE = 9.8%). Validation in a separate institution did not achieve accuracy greater than random estimation. Visual review of training cases suggests that poor tumor coverage may be a result of atypical ablation zone shrinkage 1 month post-RFA, which may not be reflected in clinical utilization.</p><p><strong>Conclusion: </strong>An AI model that uses 2D images at the center of the tumor and 1 month post-ablation can accurately estimate ablation tumor coverage. In separate validation cohorts, translation could be challenging.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin
{"title":"Multi-volume rendering using depth buffers for surgical planning in virtual reality.","authors":"Balázs Faludi, Marek Żelechowski, Maria Licci, Norbert Zentai, Attill Saemann, Daniel Studer, Georg Rauter, Raphael Guzman, Carol Hasler, Gregory F Jost, Philippe C Cattin","doi":"10.1007/s11548-025-03432-y","DOIUrl":"https://doi.org/10.1007/s11548-025-03432-y","url":null,"abstract":"<p><strong>Purpose: </strong>Planning highly complex surgeries in virtual reality (VR) provides a user-friendly and natural way to navigate volumetric medical data and can improve the sense of depth and scale. Using ray marching-based volume rendering to display the data has several benefits over traditional mesh-based rendering, such as offering a more accurate and detailed visualization without the need for prior segmentation and meshing. However, volume rendering can be difficult to extend to support multiple intersecting volumes in a scene while maintaining a high enough update rate for a comfortable user experience in VR.</p><p><strong>Methods: </strong>Upon loading a volume, a rough ad hoc segmentation is performed using a motion-tracked controller. The segmentation is not used to extract a surface mesh and does not need to precisely define the exact surfaces to be rendered, as it only serves to separate the volume into individual sub-volumes, which are rendered in multiple, consecutive volume rendering passes. For each pass, the ray lengths are written into the camera depth buffer at early ray termination and read in subsequent passes to ensure correct occlusion between individual volumes.</p><p><strong>Results: </strong>We evaluate the performance of the multi-volume renderer using three different use cases and corresponding datasets. We show that the presented approach can avoid dropped frames at the typical update rate of 90 frames per second of a desktop-based VR system and, therefore, provide a comfortable user experience even in the presence of more than twenty individual volumes.</p><p><strong>Conclusion: </strong>Our proof-of-concept implementation shows the feasibility of VR-based surgical planning systems, which require dynamic and direct manipulation of the original volumetric data without sacrificing rendering performance and user experience.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144250776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ina Vernikouskaya, Volker Rasche, Jan Kassubek, Hans-Peter Müller
{"title":"Hypothalamus and intracranial volume segmentation at the group level by use of a Gradio-CNN framework.","authors":"Ina Vernikouskaya, Volker Rasche, Jan Kassubek, Hans-Peter Müller","doi":"10.1007/s11548-025-03438-6","DOIUrl":"https://doi.org/10.1007/s11548-025-03438-6","url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to develop and evaluate a graphical user interface (GUI) for the automated segmentation of the hypothalamus and intracranial volume (ICV) in brain MRI scans. The interface was designed to facilitate efficient and accurate segmentation for research applications, with a focus on accessibility and ease of use for end-users.</p><p><strong>Methods: </strong>We developed a web-based GUI using the Gradio library integrating deep learning-based segmentation models trained on annotated brain MRI scans. The model utilizes a U-Net architecture to delineate the hypothalamus and ICV. The GUI allows users to upload high-resolution MRI scans, visualize the segmentation results, calculate hypothalamic volume and ICV, and manually correct individual segmentation results. To ensure widespread accessibility, we deployed the interface using ngrok, allowing users to access the tool via a shared link. As an example for the universality of the approach, the tool was applied to a group of 90 patients with Parkinson's disease (PD) and 39 controls.</p><p><strong>Results: </strong>The GUI demonstrated high usability and efficiency in segmenting the hypothalamus and the ICV, with no significant difference in normalized hypothalamic volume observed between PD patients and controls, consistent with previously published findings. The average processing time per patient volume was 18 s for the hypothalamus and 44 s for the ICV segmentation on a 6 GB NVidia GeForce GTX 1060 GPU. The ngrok-based deployment allowed for seamless access across different devices and operating systems, with an average connection time of less than 5 s.</p><p><strong>Conclusion: </strong>The developed GUI provides a powerful and accessible tool for applications in neuroimaging. The combination of the intuitive interface, accurate deep learning-based segmentation, and easy deployment via ngrok addresses the need for user-friendly tools in brain MRI analysis. This approach has the potential to streamline workflows in neuroimaging research.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DivGI: delve into digestive endoscopy image classification.","authors":"Qi He, Sophia Bano, Danail Stoyanov, Siyang Zuo","doi":"10.1007/s11548-025-03441-x","DOIUrl":"https://doi.org/10.1007/s11548-025-03441-x","url":null,"abstract":"<p><strong>Purpose: </strong>Gastrointestinal (GI) endoscopic imaging involves capturing routine anatomical landmarks and suspected lesions during endoscopic procedures for the clinical diagnosis of GI diseases. These images present three key challenges compared to typical scene images: significant class imbalance, a lack of distinctive features, and high similarity between some categories. While existing research has addressed the issue of image quantity imbalance, the challenges posed by indistinct features and inter-category similarity remain unresolved. This study proposes a unified image classification framework designed to tackle all three of these challenges comprehensively.</p><p><strong>Methods: </strong>We present a novel network architecture, DivGI, which integrates three essential strategies-balanced sampling, fine-grained classification, and multi-label classification-within a single framework. The balanced sampling strategy is implemented via resampling and mix-up techniques, fine-grained classification is enabled through multi-granularity feature learning, and multi-label classification is achieved using hierarchical label joint learning. The performance of our method is validated using three publicly available datasets.</p><p><strong>Results: </strong>Extensive experimental results demonstrate that DivGI significantly improves classification accuracy compared to existing approaches, with Matthews correlation coefficients (MCC) of 91.31% on the HyperKvasir dataset, 86.72% on the Upper GI dataset, and 82.88% on the GastroVision dataset. These results highlight that DivGI is more effective and efficient compared to existing methods.</p><p><strong>Conclusion: </strong>The proposed GI classification network, which incorporates multiple strategies, effectively classifies both routine landmark and suspected lesion images, aiming to facilitate better clinical diagnostics in gastrointestinal endoscopy. The code and data are publicly available at https://github.com/howardchina/DivGI.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive sensitivity-fisher regularization for heterogeneous transfer learning of vascular segmentation in laparoscopic videos.","authors":"Xinkai Zhao, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kazunari Misawa, Kensaku Mori","doi":"10.1007/s11548-025-03404-2","DOIUrl":"https://doi.org/10.1007/s11548-025-03404-2","url":null,"abstract":"<p><strong>Purpose: </strong>This study aims to enhance surgical safety by developing a method for vascular segmentation in laparoscopic surgery videos with limited visibility. We introduce an adaptive sensitivity-fisher regularization (ASFR) approach to adapt neural networks, initially trained on non-medical datasets, for vascular segmentation in laparoscopic videos.</p><p><strong>Methods: </strong>Our approach utilizes heterogeneous transfer learning by integrating fisher information and sensitivity analysis to mitigate catastrophic forgetting and overfitting caused by limited annotated data in laparoscopic videos. We calculate fisher information to identify and preserve critical model parameters while using sensitivity measures to guide adjustment for new task.</p><p><strong>Results: </strong>The fine-tuned models demonstrated high accuracy in vascular segmentation across various complex video sequences, including those with obscured vessels. For both invisible and visible vessels, our method achieved an average Dice score of 41.3. In addition to outperforming traditional transfer learning approaches, our method exhibited strong adaptability across multiple advanced video segmentation architectures.</p><p><strong>Conclusion: </strong>This study introduces a novel heterogeneous transfer learning approach, ASFR, which significantly enhances the precision of vascular segmentation in laparoscopic videos. ASFR effectively addresses critical challenges in surgical image analysis and paves the way for broader applications in laparoscopic surgery, promising improved patient outcomes and increased surgical efficiency.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144235858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}