Medical image analysis最新文献

筛选
英文 中文
MSFusion: A multi-source hybrid feature fusion network for accurate grading of invasive breast cancer using H&E-stained histopathological images MSFusion:一种多源混合特征融合网络,用于使用h&e染色的组织病理学图像对浸润性乳腺癌进行准确分级
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-23 DOI: 10.1016/j.media.2025.103633
Yuli Chen , Jiayang Bai , Jinjie Wang , Guoping Chen , Xinxin Zhang , Duan-Bo Shi , Xiujuan Lei , Peng Gao , Cheng Lu
{"title":"MSFusion: A multi-source hybrid feature fusion network for accurate grading of invasive breast cancer using H&E-stained histopathological images","authors":"Yuli Chen ,&nbsp;Jiayang Bai ,&nbsp;Jinjie Wang ,&nbsp;Guoping Chen ,&nbsp;Xinxin Zhang ,&nbsp;Duan-Bo Shi ,&nbsp;Xiujuan Lei ,&nbsp;Peng Gao ,&nbsp;Cheng Lu","doi":"10.1016/j.media.2025.103633","DOIUrl":"10.1016/j.media.2025.103633","url":null,"abstract":"<div><div>Invasive breast cancer (IBC) is a prevalent malignant tumor in women, and precise grading plays a pivotal role in ensuring effective treatment and enhancing survival rates. However, accurately grading IBC presents a significant challenge due to its heterogeneous nature and the need to harness the complementary information from multiple nuclei sources in histopathology images. To tackle this critical problem, we introduce a novel multi-source hybrid feature fusion network named MSFusion. This network incorporates two types of hybrid features: deep learning features extracted by a novel Swin Transformer-based multi-branch network called MSwinT, and traditional handcrafted features that capture the morphological characteristics of multi-source nuclei. The primary branch of MSwinT captures the overall characteristics of the original images, while multiple auxiliary branches focus on identifying morphological features from diverse sources of nuclei, including tumor, mitotic, tubular, and epithelial nuclei. At each of the four stages for the branches in MSwinT, a functional KDC (key diagnostic components) fusion block with channel and spatial attentions is proposed to integrate the features extracted by all the branches. Ultimately, we synthesize the multi-source hybrid deep learning features and handcrafted features to improve the accuracy of IBC diagnosis and grading. Our multi-branch MSFusion network is rigorously evaluated on three distinct datasets, including two private clinical datasets (Qilu dataset and QDUH&amp;SHSU dataset) as well as a publicly available Databiox dataset. The experimental results consistently demonstrate that our proposed MSFusion model outperforms the state-of-the-art methods. Specifically, the AUC for the Qilu dataset and QDUH&amp;SHSU dataset are 81.3% and 90.2%, respectively, while the public Databiox dataset yields an AUC of 82.1%.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"104 ","pages":"Article 103633"},"PeriodicalIF":10.7,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning contrast and content representations for synthesizing magnetic resonance image of arbitrary contrast 学习合成任意对比磁共振图像的对比与内容表示
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-23 DOI: 10.1016/j.media.2025.103635
Honglin Xiong , Yulin Wang , Zhenrong Shen , Kaicong Sun , Yu Fang , Yan Chen , Dinggang Shen , Qian Wang
{"title":"Learning contrast and content representations for synthesizing magnetic resonance image of arbitrary contrast","authors":"Honglin Xiong ,&nbsp;Yulin Wang ,&nbsp;Zhenrong Shen ,&nbsp;Kaicong Sun ,&nbsp;Yu Fang ,&nbsp;Yan Chen ,&nbsp;Dinggang Shen ,&nbsp;Qian Wang","doi":"10.1016/j.media.2025.103635","DOIUrl":"10.1016/j.media.2025.103635","url":null,"abstract":"<div><div>Magnetic Resonance Imaging (MRI) produces images with different contrasts, providing complementary information for clinical diagnoses and research. However, acquiring a complete set of MRI sequences can be challenging due to limitations such as lengthy scan time, motion artifacts, hardware constraints, and patient-related factors. To address this issue, we propose a novel method to learn Contrast and Content Representations (CCR) for cross-contrast MRI synthesis. Unlike existing approaches that implicitly model relationships between different contrasts, our key insight is to explicitly separate contrast information from anatomical content, allowing for more flexible and accurate synthesis. CCR learns a unified content representation that captures the underlying anatomical structures common to all contrasts, along with separate contrast representations that encode specific contrast information. By recombining the learned content representation with an arbitrary contrast representation, our method can synthesize MR images of any desired contrast. We validate our approach on both the BraTS 2021 dataset and an in-house dataset with diverse FSE acquisition parameters. Our experiments demonstrate that our CCR framework not only handles diverse input–output contrast combinations using a single trained model but also generalizes to synthesize images of new contrasts unseen during training. Quantitatively, CCR outperforms state-of-the-art methods by an average of 2.9 dB in PSNR and 0.08 in SSIM across all tested combinations. The code is available at <span><span>https://github.com/xionghonglin/Arbitrary_Contrast_MRI_Synthesis</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"104 ","pages":"Article 103635"},"PeriodicalIF":10.7,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144138717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated motor-leg scoring in stroke via a stable graph causality debiasing model 基于稳定图因果关系去偏模型的脑卒中运动腿自动评分
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-15 DOI: 10.1016/j.media.2025.103643
Rui Guo , Xinyue Li , Miaomiao Xu , Lian Gu , Xiaohua Qian
{"title":"Automated motor-leg scoring in stroke via a stable graph causality debiasing model","authors":"Rui Guo ,&nbsp;Xinyue Li ,&nbsp;Miaomiao Xu ,&nbsp;Lian Gu ,&nbsp;Xiaohua Qian","doi":"10.1016/j.media.2025.103643","DOIUrl":"10.1016/j.media.2025.103643","url":null,"abstract":"<div><div>Difficulty in resisting gravity is a common leg motor impairment in stroke patients, significantly impacting daily life. Automated clinical-level quantification of motor-leg videos based on the National Institutes of Health Stroke Scale is crucial for consistent and timely stroke diagnosis and assessment. However, real-world applications are challenged by interference impacting motion representation and decision-making, leading to performance instability. To address this, we propose a causality debiasing graph convolutional network. This model systematically reduces interference in both motor and non-motor body parts, extracting causal representations from human skeletons to ensure reliable decision-making. Specifically, an intra-class causality enhancement module is first proposed to resolve instability in motor-leg representations. This involves separating skeletal graphs with the same score, generating unbiased samples with similar discriminative features, and improving causal consistency. Subsequently, an inter-class non-causality suppression module is designed to handle biases in non-motor body parts. By decoupling skeletal graphs with different scores, this module constructs biased samples and enhances decision stability despite non-causal factors. Extensive validation on the clinical video dataset highlights the strong performance of our method for motor-leg scoring, achieving an impressive correlation above 0.82 with clinical scores, while independent testing at two additional hospitals further reinforces its stability. Furthermore, performance on another motor-arm scoring task and an additional Parkinsonian gait assessment task also successfully confirmed the method’s reliability. Even when faced with potential real-world interferences, our approach consistently shows substantial value, offering both clinical significance and credibility. In summary, this work provides new insights for daily stroke assessment and telemedicine, with significant potential for widespread clinical adoption.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"104 ","pages":"Article 103643"},"PeriodicalIF":10.7,"publicationDate":"2025-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AttriMIL: Revisiting attention-based multiple instance learning for whole-slide pathological image classification from a perspective of instance attributes 目的:从实例属性的角度重新探讨基于注意的病理图像分类多实例学习
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-14 DOI: 10.1016/j.media.2025.103631
Linghan Cai , Shenjin Huang , Ye Zhang , Jinpeng Lu , Yongbing Zhang
{"title":"AttriMIL: Revisiting attention-based multiple instance learning for whole-slide pathological image classification from a perspective of instance attributes","authors":"Linghan Cai ,&nbsp;Shenjin Huang ,&nbsp;Ye Zhang ,&nbsp;Jinpeng Lu ,&nbsp;Yongbing Zhang","doi":"10.1016/j.media.2025.103631","DOIUrl":"10.1016/j.media.2025.103631","url":null,"abstract":"<div><div>Multiple instance learning (MIL) is a powerful approach for whole-slide pathological image (WSI) analysis, particularly suited for processing gigapixel-resolution images with slide-level labels. Recent attention-based MIL architectures have significantly advanced weakly supervised WSI classification, facilitating both clinical diagnosis and localization of disease-positive regions. However, these methods often face challenges in differentiating between instances, leading to tissue misidentification and a potential degradation in classification performance. To address these limitations, we propose AttriMIL, an attribute-aware multiple instance learning framework. By dissecting the computational flow of attention-based MIL models, we introduce a multi-branch attribute scoring mechanism that quantifies the pathological attributes of individual instances. Leveraging these quantified attributes, we further establish region-wise and slide-wise attribute constraints to dynamically model instance correlations both within and across slides during training. These constraints encourage the network to capture intrinsic spatial patterns and semantic similarities between image patches, thereby enhancing its ability to distinguish subtle tissue variations and sensitivity to challenging instances. To fully exploit the two constraints, we further develop a pathology adaptive learning technique to optimize pre-trained feature extractors, enabling the model to efficiently gather task-specific features. Extensive experiments on five public datasets demonstrate that AttriMIL consistently outperforms state-of-the-art methods across various dimensions, including bag classification accuracy, generalization ability, and disease-positive region localization. The implementation code is available at <span><span>https://github.com/MedCAI/AttriMIL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103631"},"PeriodicalIF":10.7,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaptFRCNet: Semi-supervised adaptation of pre-trained model with frequency and region consistency for medical image segmentation AdaptFRCNet:基于频率和区域一致性的预训练模型的半监督自适应医学图像分割
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-13 DOI: 10.1016/j.media.2025.103626
Along He , Yanlin Wu , Zhihong Wang , Tao Li , Huazhu Fu
{"title":"AdaptFRCNet: Semi-supervised adaptation of pre-trained model with frequency and region consistency for medical image segmentation","authors":"Along He ,&nbsp;Yanlin Wu ,&nbsp;Zhihong Wang ,&nbsp;Tao Li ,&nbsp;Huazhu Fu","doi":"10.1016/j.media.2025.103626","DOIUrl":"10.1016/j.media.2025.103626","url":null,"abstract":"<div><div>Recently, large pre-trained models (LPM) have achieved great success, which provides rich feature representation for downstream tasks. Pre-training and then fine-tuning is an effective way to utilize LPM. However, the application of LPM in the medical domain is hindered by the presence of a large number of parameters and a limited amount of labeled data. In clinical practice, there exists a substantial amount of unlabeled data that remains underutilized. Semi-supervised learning emerges as a promising approach to harnessing these unlabeled data effectively. In this paper, we propose semi-supervised adaptation of pre-trained model with frequency and region consistency (AdaptFRCNet) for medical image segmentation. Specifically, the pre-trained model is frozen and the proposed lightweight attention-based adapters (Att_Adapter) are inserted into the frozen backbone for parameter-efficient fine-tuning (PEFT). We propose two consistency regularization strategies for semi-supervised learning: frequency domain consistency (FDC) and multi-granularity region similarity consistency (MRSC). FDC aids in learning features within the frequency domain, and MRSC aims to achieve multiple region-level feature consistencies, capturing local context information effectively. By leveraging the proposed Att_Adapter along with FDC and MRSC, we can effectively and efficiently harness the powerful feature representation capability of the LPM. We conduct extensive experiments on three medical image segmentation datasets, demonstrating significant performance improvements over other state-of-the-art methods. The code is available at <span><span>https://github.com/NKUhealong/AdaptFRCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103626"},"PeriodicalIF":10.7,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey of deep-learning-based radiology report generation using multimodal inputs 使用多模态输入的基于深度学习的放射学报告生成的调查
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-13 DOI: 10.1016/j.media.2025.103627
Xinyi Wang , Grazziela Figueredo , Ruizhe Li , Wei Emma Zhang , Weitong Chen , Xin Chen
{"title":"A survey of deep-learning-based radiology report generation using multimodal inputs","authors":"Xinyi Wang ,&nbsp;Grazziela Figueredo ,&nbsp;Ruizhe Li ,&nbsp;Wei Emma Zhang ,&nbsp;Weitong Chen ,&nbsp;Xin Chen","doi":"10.1016/j.media.2025.103627","DOIUrl":"10.1016/j.media.2025.103627","url":null,"abstract":"<div><div>Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources, therefore becoming an important topic in the medical image analysis field. It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data (i.e., medical images, clinical information, medical knowledge, etc.), and produce comprehensive and accurate reports. Recently, numerous works have emerged to address this issue using deep-learning-based methods, such as transformers, contrastive learning, and knowledge-base construction. This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep-learning-based report generation with five main components, including multi-modality data acquisition, data preparation, feature learning, feature fusion and interaction, and report generation. The state-of-the-art methods for each of these components are highlighted. Additionally, we summarize the latest developments in large model-based methods and model explainability, along with public datasets, evaluation methods, current challenges, and future directions in this field. We have also conducted a quantitative comparison between different methods in the same experimental setting. This is the most up-to-date survey that focuses on multi-modality inputs and data fusion for radiology report generation. The aim is to provide comprehensive and rich information for researchers interested in automatic clinical report generation and medical image analysis, especially when using multimodal inputs, and to assist them in developing new algorithms to advance the field.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103627"},"PeriodicalIF":10.7,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144071883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Driven by textual knowledge: A Text-View Enhanced Knowledge Transfer Network for lung infection region segmentation 文本知识驱动:基于文本视图的肺部感染区域分割知识转移网络
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-12 DOI: 10.1016/j.media.2025.103625
Lexin Fang , Xuemei Li , Yunyang Xu , Fan Zhang , Caiming Zhang
{"title":"Driven by textual knowledge: A Text-View Enhanced Knowledge Transfer Network for lung infection region segmentation","authors":"Lexin Fang ,&nbsp;Xuemei Li ,&nbsp;Yunyang Xu ,&nbsp;Fan Zhang ,&nbsp;Caiming Zhang","doi":"10.1016/j.media.2025.103625","DOIUrl":"10.1016/j.media.2025.103625","url":null,"abstract":"<div><div>Lung infections are the leading cause of death among infectious diseases, and accurate segmentation of the infected lung area is crucial for effective treatment. Currently, segmentation methods that rely solely on imaging data have limited accuracy. Incorporating text information enriched with expert knowledge into the segmentation process has emerged as a novel approach. However, previous methods often used unified text encoding strategies for extracting textual features. It failed to adequately emphasize critical details in the text, particularly the spatial location of infected regions. Moreover, the semantic space inconsistency between text and image features complicates cross-modal information transfer. To close these gaps, we propose a <strong>Text-View Enhanced Knowledge Transfer Network (TVE-Net)</strong> that leverages key information from textual data to assist in segmentation and enhance the model’s perception of lung infection locations. The method generates a text view by probabilistically modeling the location information of infected areas in text using a robust, carefully designed positional probability function. By assigning lesion probabilities to each image region, the infected areas’ spatial information from the text view is explicitly integrated into the segmentation model. Once the text view has been introduced, a unified image encoder can be employed to extract text view features, so that both text and images are mapped into the same space. In addition, a self-supervised constraint based on text-view overlap and feature consistency is proposed to enhance the model’s robustness and semi-supervised capability through feature augmentation. Meanwhile, the newly designed multi-stage knowledge transfer module utilizes a globally enhanced cross-attention mechanism to comprehensively learn the implicit correlations between image features and text-view features, enabling effective knowledge transfer from text-view features to image features. Extensive experiments demonstrate that TVE-Net outperforms both unimodal and multimodal methods in both fully supervised and semi-supervised lung infection segmentation tasks, achieving significant improvements on QaTa-COV19 and MosMedData+ datasets.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103625"},"PeriodicalIF":10.7,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-guided MR-to-CT synthesis with spatial and semantic alignments for attenuation correction of whole-body PET/MR imaging 结构引导MR- ct合成与空间和语义对齐,用于全身PET/MR成像的衰减校正。
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-10 DOI: 10.1016/j.media.2025.103622
Jiaxu Zheng , Zhenrong Shen , Lichi Zhang , Qun Chen
{"title":"Structure-guided MR-to-CT synthesis with spatial and semantic alignments for attenuation correction of whole-body PET/MR imaging","authors":"Jiaxu Zheng ,&nbsp;Zhenrong Shen ,&nbsp;Lichi Zhang ,&nbsp;Qun Chen","doi":"10.1016/j.media.2025.103622","DOIUrl":"10.1016/j.media.2025.103622","url":null,"abstract":"<div><div>Image synthesis from Magnetic Resonance (MR) to Computed Tomography (CT) can estimate the electron density of tissues, thereby facilitating Positron Emission Tomography (PET) attenuation correction in whole-body PET/MR imaging. Whole-body MR-to-CT synthesis faces several challenges including the spatial misalignment caused by tissue variety and respiratory movements, and the complex intensity mapping due to large intensity variations across the whole body. However, existing MR-to-CT synthesis methods mainly focus on body sub-regions, making them ineffective in addressing these challenges. Here we propose a novel whole-body MR-to-CT synthesis framework, which consists of three novel modules to tackle these challenges: (1) Structure-Guided Synthesis module leverages structure-guided attention gates to enhance synthetic image quality by diminishing unnecessary contours of soft tissues; (2) Spatial Alignment module yields precise registration between paired MR and CT images by taking into account the impacts of tissue volumes and respiratory movements, thus providing well-aligned ground-truth CT images during training; (3) Semantic Alignment module utilizes contrastive learning to constrain organ-related semantic information, thereby ensuring the semantic authenticity of synthetic CT images. Extensive experiments demonstrate that our method produces visually plausible and semantically accurate CT images, outperforming existing approaches in both synthetic image quality and PET attenuation correction accuracy.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103622"},"PeriodicalIF":10.7,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144031343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Next-generation surgical navigation: Marker-less multi-view 6DoF pose estimation of surgical instruments 下一代手术导航:无标记的多视角6DoF手术器械姿态估计
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-10 DOI: 10.1016/j.media.2025.103613
Jonas Hein , Nicola Cavalcanti , Daniel Suter , Lukas Zingg , Fabio Carrillo , Lilian Calvet , Mazda Farshad , Nassir Navab , Marc Pollefeys , Philipp Fürnstahl
{"title":"Next-generation surgical navigation: Marker-less multi-view 6DoF pose estimation of surgical instruments","authors":"Jonas Hein ,&nbsp;Nicola Cavalcanti ,&nbsp;Daniel Suter ,&nbsp;Lukas Zingg ,&nbsp;Fabio Carrillo ,&nbsp;Lilian Calvet ,&nbsp;Mazda Farshad ,&nbsp;Nassir Navab ,&nbsp;Marc Pollefeys ,&nbsp;Philipp Fürnstahl","doi":"10.1016/j.media.2025.103613","DOIUrl":"10.1016/j.media.2025.103613","url":null,"abstract":"<div><div>State-of-the-art research of traditional computer vision is increasingly leveraged in the surgical domain. A particular focus in computer-assisted surgery is to replace marker-based tracking systems for instrument localization with pure image-based 6DoF pose estimation using deep-learning methods. However, state-of-the-art single-view pose estimation methods do not yet meet the accuracy required for surgical navigation. In this context, we investigate the benefits of multi-view setups for highly accurate and occlusion-robust 6DoF pose estimation of surgical instruments and derive recommendations for an ideal camera system that addresses the challenges in the operating room. Our contributions are threefold. First, we present a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured with static and head-mounted cameras and including rich annotations for surgeon, instruments, and patient anatomy. Second, we perform an extensive evaluation of three state-of-the-art single-view and multi-view pose estimation methods, analyzing the impact of camera quantities and positioning, limited real-world data, and static, hybrid, or fully mobile camera setups on the pose accuracy, occlusion robustness, and generalizability. Third, we design a multi-camera system for marker-less surgical instrument tracking, achieving an average position error of 1.01<!--> <!-->mm and orientation error of 0.89° for a surgical drill, and 2.79<!--> <!-->mm and 3.33° for a screwdriver under optimal conditions. Our results demonstrate that marker-less tracking of surgical instruments is becoming a feasible alternative to existing marker-based systems.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103613"},"PeriodicalIF":10.7,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nested hierarchical group-wise registration with a graph-based subgrouping strategy for efficient template construction 嵌套分层分组明智注册与基于图的子分组策略,有效的模板构建
IF 10.7 1区 医学
Medical image analysis Pub Date : 2025-05-10 DOI: 10.1016/j.media.2025.103624
Tongtong Che , Lin Zhang , Debin Zeng , Yan Zhao , Haoying Bai , Jichang Zhang , Xiuying Wang , Shuyu Li
{"title":"Nested hierarchical group-wise registration with a graph-based subgrouping strategy for efficient template construction","authors":"Tongtong Che ,&nbsp;Lin Zhang ,&nbsp;Debin Zeng ,&nbsp;Yan Zhao ,&nbsp;Haoying Bai ,&nbsp;Jichang Zhang ,&nbsp;Xiuying Wang ,&nbsp;Shuyu Li","doi":"10.1016/j.media.2025.103624","DOIUrl":"10.1016/j.media.2025.103624","url":null,"abstract":"<div><div>Accurate and efficient group-wise registration for medical images is fundamentally important to construct a common template image for population-level analysis. However, current group-wise registration faces the challenges posed by the algorithm’s efficiency and capacity, and adaptability to large variations in the subject populations. This paper addresses these challenges with a novel Nested Hierarchical Group-wise Registration (NHGR) framework. Firstly, to alleviate the registration burden due to significant population variations, a new subgrouping strategy is proposed to serve as a “divide and conquer” mechanism that divides a large population into smaller subgroups. The subgroups with a hierarchical sequence are formed by gradually expanding the scale factors that relate to feature similarity and then conducting registration at the subgroup scale as the multi-scale conquer strategy. Secondly, the nested hierarchical group-wise registration is proposed to conquer the challenges due to the efficiency and capacity of the model from three perspectives. (1) Population level: the global group-wise registration is performed to generate age-related sub-templates from local subgroups progressively to the global population. (2) Subgroup level: the local group-wise registration is performed based on local image distributions to reduce registration error and achieve rapid optimization of sub-templates. (3) Image pair level: a deep multi-resolution registration network is employed for better registration efficiency. The proposed framework was evaluated on the brain datasets of adults and adolescents, respectively from 18 to 96 years and 5 to 21 years. Experimental results consistently demonstrated that our proposed group-wise registration method achieved better performance in terms of registration efficiency, template sharpness, and template centrality.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"103 ","pages":"Article 103624"},"PeriodicalIF":10.7,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信