Yi Lin , Dong Zhang , Xiao Fang , Yufan Chen , Kwang-Ting Cheng , Hao Chen
{"title":"Rethinking boundary detection in deep learning-based medical image segmentation","authors":"Yi Lin , Dong Zhang , Xiao Fang , Yufan Chen , Kwang-Ting Cheng , Hao Chen","doi":"10.1016/j.media.2025.103615","DOIUrl":"10.1016/j.media.2025.103615","url":null,"abstract":"<div><div>Medical image segmentation is a pivotal task within the realms of medical image analysis and computer vision. While current methods have shown promise in accurately segmenting major regions of interest, the precise segmentation of boundary areas remains challenging. In this study, we propose a novel network architecture named CTO, which combines Convolutional Neural Networks (CNNs), Vision Transformer (ViT) models, and explicit edge detection operators to tackle this challenge. CTO surpasses existing methods in terms of segmentation accuracy and strikes a better balance between accuracy and efficiency, without the need for additional data inputs or label injections. Specifically, CTO adheres to the canonical encoder–decoder network paradigm, with a dual-stream encoder network comprising a mainstream CNN stream for capturing local features and an auxiliary StitchViT stream for integrating long-range dependencies. Furthermore, to enhance the model’s ability to learn boundary areas, we introduce a boundary-guided decoder network that employs binary boundary masks generated by dedicated edge detection operators to provide explicit guidance during the decoding process. We validate the performance of CTO through extensive experiments conducted on seven challenging medical image segmentation datasets, namely ISIC 2016, PH2, ISIC 2018, CoNIC, LiTS17, BraTS, and BTCV. Our experimental results unequivocally demonstrate that CTO achieves state-of-the-art accuracy on these datasets while maintaining competitive model complexity. The codes have been released at: <span><span>CTO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103615"},"PeriodicalIF":10.7,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuixing Wu , Jincheng Xie , Fangrong Liang , Weixiong Zhong , Ruimeng Yang , Yuankui Wu , Tao Liang , Linjing Wang , Xin Zhen
{"title":"REPAIR: Reciprocal assistance imputation-representation learning for glioma diagnosis with incomplete MRI sequences","authors":"Chuixing Wu , Jincheng Xie , Fangrong Liang , Weixiong Zhong , Ruimeng Yang , Yuankui Wu , Tao Liang , Linjing Wang , Xin Zhen","doi":"10.1016/j.media.2025.103634","DOIUrl":"10.1016/j.media.2025.103634","url":null,"abstract":"<div><div>The absence of MRI sequences is a common occurrence in clinical practice, posing a significant challenge for prediction modeling of non-invasive diagnosis of glioma (GM) via fusion of multi-sequence MRI. To address this issue, we propose a novel unified reciprocal assistance imputation-representation learning framework (namely REPAIR) for GM diagnosis modeling with incomplete MRI sequences. REPAIR facilitates a cooperative process between missing value imputation and multi-sequence MRI fusion by leveraging existing samples to inform the imputation of missing values. This, in turn, facilitates the learning of a shared latent representation, which reciprocally guides more accurate imputation of missing values. To tailor the learned representation for downstream tasks, a novel ambiguity-aware intercorrelation regularization is introduced to equip REPAIR by correlating imputation ambiguity and its impacts conveying to the learned representation via a fuzzy paradigm. Additionally, a multimodal structural calibration constraint is devised to correct for the structural shift caused by missing data, ensuring structural consistency between the learned representations and the actual data. The proposed methodology is extensively validated on eight GM datasets with incomplete MRI sequences and six clinical datasets from other diseases with incomplete imaging modalities. Comprehensive comparisons with state-of-the-art methods have demonstrated the competitiveness of our approach for GM diagnosis with incomplete MRI sequences, as well as its potential for generalization to various diseases with missing imaging modalities.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103634"},"PeriodicalIF":10.7,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Spektor , Tom Friedman , Itay Or , Gil Bolotin , Shlomi Laufer
{"title":"Monocular pose estimation of articulated open surgery tools - in the wild","authors":"Robert Spektor , Tom Friedman , Itay Or , Gil Bolotin , Shlomi Laufer","doi":"10.1016/j.media.2025.103618","DOIUrl":"10.1016/j.media.2025.103618","url":null,"abstract":"<div><div>This work presents a framework for monocular 6D pose estimation of surgical instruments in open surgery, addressing challenges such as object articulations, specularity, occlusions, and synthetic-to-real domain adaptation. The proposed approach consists of three main components: <span><math><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></math></span> synthetic data generation pipeline that incorporates 3D scanning of surgical tools with articulation rigging and physically-based rendering; <span><math><mrow><mo>(</mo><mn>2</mn><mo>)</mo></mrow></math></span> a tailored pose estimation framework combining tool detection with pose and articulation estimation; and <span><math><mrow><mo>(</mo><mn>3</mn><mo>)</mo></mrow></math></span> a training strategy on synthetic and real unannotated video data, employing domain adaptation with automatically generated pseudo-labels. Evaluations conducted on real data of open surgery demonstrate the good performance and real-world applicability of the proposed framework, highlighting its potential for integration into medical augmented reality and robotic systems. The approach eliminates the need for extensive manual annotation of real surgical data.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103618"},"PeriodicalIF":10.7,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143971799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lavsen Dahal , Mobina Ghojoghnejad , Liesbeth Vancoillie , Dhrubajyoti Ghosh , Yubraj Bhandari , David Kim , Fong Chi Ho , Fakrul Islam Tushar , Sheng Luo , Kyle J. Lafata , Ehsan Abadi , Ehsan Samei , Joseph Y. Lo , W. Paul Segars
{"title":"XCAT 3.0: A comprehensive library of personalized digital twins derived from CT scans","authors":"Lavsen Dahal , Mobina Ghojoghnejad , Liesbeth Vancoillie , Dhrubajyoti Ghosh , Yubraj Bhandari , David Kim , Fong Chi Ho , Fakrul Islam Tushar , Sheng Luo , Kyle J. Lafata , Ehsan Abadi , Ehsan Samei , Joseph Y. Lo , W. Paul Segars","doi":"10.1016/j.media.2025.103636","DOIUrl":"10.1016/j.media.2025.103636","url":null,"abstract":"<div><div>Virtual Imaging Trials (VIT) offer a cost-effective and scalable approach for evaluating medical imaging technologies. Computational phantoms, which mimic real patient anatomy and physiology, play a central role in VITs. However, the current libraries of computational phantoms face limitations, particularly in terms of sample size and heterogeneity. Insufficient representation of the population hampers accurate assessment of imaging technologies across different patient groups. Traditionally, the more realistic computational phantoms were created by manual segmentation, which is a laborious and time-consuming task, impeding the expansion of phantom libraries. This study presents a framework for creating realistic computational phantoms using a suite of automatic segmentation models and performing three forms of automated quality control on the segmented organ masks. The result is the release of over 2500 new XCAT 3 generation of computational phantoms. This new formation embodies 140 structures and represents a comprehensive approach to detailed anatomical modeling. The developed computational phantoms are formatted in both voxelized and surface mesh formats. The framework is combined with an in-house CT scanner simulator to produce realistic CT images. The framework has the potential to advance virtual imaging trials, facilitating comprehensive and reliable evaluations of medical imaging technologies. Phantoms may be requested at <span><span>https://cvit.duke.edu/resources/</span><svg><path></path></svg></span>. Code, model weights, and sample CT images are available at <span><span>https://xcat-3.github.io/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103636"},"PeriodicalIF":10.7,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julio Silva-Rodríguez , Jose Dolz , Ismail Ben Ayed
{"title":"Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation","authors":"Julio Silva-Rodríguez , Jose Dolz , Ismail Ben Ayed","doi":"10.1016/j.media.2025.103596","DOIUrl":"10.1016/j.media.2025.103596","url":null,"abstract":"<div><div>The recent popularity of foundation models and the pre-train-and-adapt paradigm, where a large-scale model is transferred to downstream tasks, is gaining attention for volumetric medical image segmentation. However, current transfer learning strategies devoted to full fine-tuning for transfer learning may require significant resources and yield sub-optimal results when the labeled data of the target task is scarce. This makes its applicability in real clinical settings challenging since these institutions are usually constrained on data and computational resources to develop proprietary solutions. To address this challenge, we formalize Few-Shot Efficient Fine-Tuning (FSEFT), a novel and realistic scenario for adapting medical image segmentation foundation models. This setting considers the key role of both data- and parameter-efficiency during adaptation. Building on a foundation model pre-trained on open-access CT organ segmentation sources, we propose leveraging Parameter-Efficient Fine-Tuning and black-box Adapters to address such challenges. Furthermore, novel efficient adaptation methodologies are introduced in this work, which include Spatial black-box Adapters that are more appropriate for dense prediction tasks and constrained transductive inference, leveraging task-specific prior knowledge. Our comprehensive transfer learning experiments confirm the suitability of foundation models in medical image segmentation and unveil the limitations of popular fine-tuning strategies in few-shot scenarios. The project code is available: <span><span>https://github.com/jusiro/fewshot-finetuning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103596"},"PeriodicalIF":10.7,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143923639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophie Loizillon , Simona Bottani , Aurélien Maire , Sebastian Ströer , Lydia Chougar , Didier Dormont , Olivier Colliot , Ninon Burgos , APPRIMAGE Study Group
{"title":"Automatic quality control of brain 3D FLAIR MRIs for a clinical data warehouse","authors":"Sophie Loizillon , Simona Bottani , Aurélien Maire , Sebastian Ströer , Lydia Chougar , Didier Dormont , Olivier Colliot , Ninon Burgos , APPRIMAGE Study Group","doi":"10.1016/j.media.2025.103617","DOIUrl":"10.1016/j.media.2025.103617","url":null,"abstract":"<div><div>Clinical data warehouses, which have arisen over the last decade, bring together the medical data of millions of patients and offer the potential to train and validate machine learning models in real-world scenarios. The quality of MRIs collected in clinical data warehouses differs significantly from that generally observed in research datasets, reflecting the variability inherent to clinical practice. Consequently, the use of clinical data requires the implementation of robust quality control tools.</div><div>By using a substantial number of pre-existing manually labelled T1-weighted MR images (5,500) alongside a smaller set of newly labelled FLAIR images (926), we present a novel semi-supervised adversarial domain adaptation architecture designed to exploit shared representations between MRI sequences thanks to a shared feature extractor, while taking into account the specificities of the FLAIR thanks to a specific classification head for each sequence. This architecture thus consists of a common invariant feature extractor, a domain classifier and two classification heads specific to the source and target, all designed to effectively deal with potential class distribution shifts between the source and target data classes. The primary objectives of this paper were: (1) to identify images which are not proper 3D FLAIR brain MRIs; (2) to rate the overall image quality.</div><div>For the first objective, our approach demonstrated excellent results, with a balanced accuracy of 89%, comparable to that of human raters. For the second objective, our approach achieved good performance, although lower than that of human raters. Nevertheless, the automatic approach accurately identified bad quality images (balanced accuracy >79%). In conclusion, our proposed approach overcomes the initial barrier of heterogeneous image quality in clinical data warehouses, thereby facilitating the development of new research using clinical routine 3D FLAIR brain images.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103617"},"PeriodicalIF":10.7,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143923640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Divyanshu Mishra , Pramit Saha , He Zhao , Netzahualcoyotl Hernandez-Cruz , Olga Patey , Aris T. Papageorghiou , J. Alison Noble
{"title":"TIER-LOC: Visual Query-based Video Clip Localization in fetal ultrasound videos with a multi-tier transformer","authors":"Divyanshu Mishra , Pramit Saha , He Zhao , Netzahualcoyotl Hernandez-Cruz , Olga Patey , Aris T. Papageorghiou , J. Alison Noble","doi":"10.1016/j.media.2025.103611","DOIUrl":"10.1016/j.media.2025.103611","url":null,"abstract":"<div><div>In this paper, we introduce the Visual Query-based task of Video Clip Localization (VQ-VCL) for medical video understanding. Specifically, we aim to retrieve a video clip containing frames similar to a given exemplar frame from a given input video. To solve the task, we propose a novel visual query-based video clip localization model called TIER-LOC. TIER-LOC is designed to improve video clip retrieval, especially in fine-grained videos by extracting features from different levels, <em>i.e.</em>, coarse to fine-grained, referred to as TIERS. The aim is to utilize multi-Tier features for detecting subtle differences, and adapting to scale or resolution variations, leading to improved video-clip retrieval. TIER-LOC has three main components: (1) a Multi-Tier Spatio-Temporal Transformer to fuse spatio-temporal features extracted from multiple Tiers of video frames with features from multiple Tiers of the visual query enabling better video understanding. (2) a Multi-Tier, Dual Anchor Contrastive Loss to deal with real-world annotation noise which can be notable at event boundaries and in videos featuring highly similar objects. (3) a Temporal Uncertainty-Aware Localization Loss designed to reduce the model sensitivity to imprecise event boundary. This is achieved by relaxing hard boundary constraints thus allowing the model to learn underlying class patterns and not be influenced by individual noisy samples. To demonstrate the efficacy of TIER-LOC, we evaluate it on two ultrasound video datasets and an open-source egocentric video dataset. First, we develop a sonographer workflow assistive task model to detect standard-frame clips in fetal ultrasound heart sweeps. Second, we assess our model’s performance in retrieving standard-frame clips for detecting fetal anomalies in routine ultrasound scans, using the large-scale PULSE dataset. Lastly, we test our model’s performance on an open-source computer vision video dataset by creating a VQ-VCL fine-grained video dataset based on the Ego4D dataset. Our model outperforms the best-performing state-of-the-art model by 7%, 4%, and 4% on the three video datasets, respectively.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103611"},"PeriodicalIF":10.7,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep implicit optimization enables robust learnable features for deformable image registration","authors":"Rohit Jena , Pratik Chaudhari , James C. Gee","doi":"10.1016/j.media.2025.103577","DOIUrl":"10.1016/j.media.2025.103577","url":null,"abstract":"<div><div>Deep Learning in Image Registration (DLIR) methods have been tremendously successful in image registration due to their speed and ability to incorporate weak label supervision at training time. However, existing DLIR methods forego many of the benefits and invariances of optimization methods. The lack of a task-specific inductive bias in DLIR methods leads to suboptimal performance, especially in the presence of domain shift. Our method aims to bridge this gap between statistical learning and optimization by explicitly incorporating optimization as a layer in a deep network. A deep network is trained to predict multi-scale dense feature images that are registered using a black box iterative optimization solver. This optimal warp is then used to minimize image and label alignment errors. By <em>implicitly</em> differentiating end-to-end through an iterative optimization solver, we <em>explicitly</em> exploit invariances of the correspondence matching problem induced by the optimization, while learning registration and label-aware features, and guaranteeing the warp functions to be a local minima of the registration objective in the feature space. Our framework shows excellent performance on in-domain datasets, and is agnostic to domain shift such as anisotropy and varying intensity profiles. For the first time, our method allows switching between arbitrary transformation representations (free-form to diffeomorphic) at test time with zero retraining. End-to-end feature learning also facilitates interpretability of features and arbitrary test-time regularization, which is not possible with existing DLIR methods.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103577"},"PeriodicalIF":10.7,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143906845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UN-SAM: Domain-adaptive self-prompt segmentation for universal nuclei images","authors":"Zhen Chen , Qing Xu , Xinyu Liu , Yixuan Yuan","doi":"10.1016/j.media.2025.103607","DOIUrl":"10.1016/j.media.2025.103607","url":null,"abstract":"<div><div>In digital pathology, precise nuclei segmentation is pivotal yet challenged by the diversity of tissue types, staining protocols, and imaging conditions. Recently, the segment anything model (SAM) revealed overwhelming performance in natural scenarios and impressive adaptation to medical imaging. Despite these advantages, the reliance on labor-intensive manual annotation as segmentation prompts severely hinders their clinical applicability, especially for nuclei image analysis containing massive cells where dense manual prompts are impractical. To overcome the limitations of current SAM methods while retaining the advantages, we propose the domain-adaptive self-prompt SAM framework for Universal Nuclei segmentation (UN-SAM), by providing a fully automated solution with superior performance across different domains. Specifically, to eliminate the labor-intensive requirement of per-nuclei annotations for prompt, we devise a multi-scale Self-Prompt Generation (SPGen) module to revolutionize clinical workflow by automatically generating high-quality mask hints to guide the segmentation tasks. Moreover, to unleash the capability of SAM across a variety of nuclei images, we devise a Domain-adaptive Tuning Encoder (DT-Encoder) to seamlessly harmonize visual features with domain-common and domain-specific knowledge, and further devise a Domain Query-enhanced Decoder (DQ-Decoder) by leveraging learnable domain queries for segmentation decoding in different nuclei domains. Extensive experiments prove that our UN-SAM surpasses state-of-the-arts in nuclei instance and semantic segmentation, especially the generalization capability on unseen nuclei domains. The source code is available at <span><span>https://github.com/CUHK-AIM-Group/UN-SAM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103607"},"PeriodicalIF":10.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haochen Shi , Jiangchang Xu , Haitao Li , Shuanglin Jiang , Chaoyu Lei , Huifang Zhou , Yinwei Li , Xiaojun Chen
{"title":"A novel spatial-temporal image fusion method for augmented reality-based endoscopic surgery","authors":"Haochen Shi , Jiangchang Xu , Haitao Li , Shuanglin Jiang , Chaoyu Lei , Huifang Zhou , Yinwei Li , Xiaojun Chen","doi":"10.1016/j.media.2025.103609","DOIUrl":"10.1016/j.media.2025.103609","url":null,"abstract":"<div><div>Augmented reality (AR) has significant potential to enhance the identification of critical locations during endoscopic surgeries, where accurate endoscope calibration is essential for ensuring the quality of augmented images. In optical-based surgical navigation systems, asynchrony between the optical tracker and the endoscope can cause the augmented scene to diverge from reality during rapid movements, potentially misleading the surgeon—a challenge that remains unresolved. In this paper, we propose a novel spatial–temporal endoscope calibration method that simultaneously determines the spatial transformation from the image to the optical marker and the temporal latency between the tracking and image acquisition systems. To estimate temporal latency, we utilize a Monte Carlo method to estimate the intrinsic parameters of the endoscope’s imaging system, leveraging a dataset of thousands of calibration samples. This dataset is larger than those typically employed in conventional camera calibration routines, rendering traditional algorithms computationally infeasible within a reasonable timeframe. By introducing latency as an independent variable into the principal equation of hand-eye calibration, we developed a weighted algorithm to iteratively solve the equation. This approach eliminates the need for a fixture to stabilize the endoscope during calibration, allowing for quicker calibration through handheld flexible movement. Experimental results demonstrate that our method achieves an average 2D error of <span><math><mrow><mn>7</mn><mo>±</mo><mn>3</mn></mrow></math></span> pixels and a pseudo-3D error of <span><math><mrow><mn>1</mn><mo>.</mo><mn>2</mn><mo>±</mo><mn>0</mn><mo>.</mo><mn>4</mn><mspace></mspace><mi>mm</mi></mrow></math></span> for stable scenes within <span><math><mrow><mn>82</mn><mo>.</mo><mn>4</mn><mo>±</mo><mn>16</mn><mo>.</mo><mn>6</mn></mrow></math></span> seconds—approximately 68% faster in operation time than conventional methods. In dynamic scenes, our method compensates for the virtual-to-reality latency of <span><math><mrow><mn>11</mn><mo>±</mo><mn>2</mn><mspace></mspace><mi>ms</mi></mrow></math></span>, which is shorter than a single frame interval and 5.7 times shorter than the uncompensated conventional method. Finally, we successfully integrated the proposed method into our surgical navigation system and validated its feasibility in clinical trials for transnasal optic canal decompression surgery. Our method has the potential to improve the safety and efficacy of endoscopic surgeries, leading to better patient outcomes.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103609"},"PeriodicalIF":10.7,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143911737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}