Medical image analysisPub Date : 2026-03-01Epub Date: 2025-12-22DOI: 10.1016/j.media.2025.103923
Tongxue Zhou , Zheng Wang , Su Ruan , Yanda Meng , Jinming Duan , Baiying Lei
{"title":"DFuse-Net: Disentangled feature fusion with uncertainty-aware learning for reliable multi-modal brain tumor segmentation","authors":"Tongxue Zhou , Zheng Wang , Su Ruan , Yanda Meng , Jinming Duan , Baiying Lei","doi":"10.1016/j.media.2025.103923","DOIUrl":"10.1016/j.media.2025.103923","url":null,"abstract":"<div><div>Accurate brain tumor segmentation from multi-modal MRI is critical for clinical diagnosis and treatment planning. However, effectively exploiting the complementary information across different modalities remains challenging due to modality-specific noise, semantic inconsistency and inherent model uncertainty. To tackle these issues, we propose a Disentangled Fusion Network named DFuse-Net that integrates disentangled feature fusion with uncertainty-aware learning for reliable multi-modal brain tumor segmentation. Specifically, DFuse-Net explicitly disentangles modality-shared and modality-specific representations, enhancing the discriminability and expressiveness of multi-modal features. Furthermore, a Disentangled Texture Fusion Module (DTFM) and a Disentangled Semantic Fusion Module (DSFM) are designed to effectively integrate texture- and semantic-level information across modalities. In addition, a contrastive-aware learning scheme is proposed to strengthen feature discriminability, while a consistency-aware learning strategy is proposed to enforce structural coherence across modalities. During inference, Monte Carlo dropout is employed to estimate voxel-wise aleatoric and epistemic uncertainties, improving segmentation reliability. Extensive experiments on the BraTS datasets demonstrate that DFuse-Net outperforms state-of-the-art methods, suggesting its potential for reliable clinical application in brain tumor diagnosis and treatment planning.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103923"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145813846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MegaSeg: Towards scalable semantic segmentation for megapixel images","authors":"Solomon Kefas Kaura , Jialun Wu , Zeyu Gao , Chen Li","doi":"10.1016/j.media.2026.103933","DOIUrl":"10.1016/j.media.2026.103933","url":null,"abstract":"<div><div>Megapixel image segmentation is essential for high-resolution histopathology image analysis, but is currently constrained by GPU memory limitations, necessitating patching and downsampling processing that compromises global and local context. This paper introduces MegaSeg, an end-to-end framework for semantic segmentation of megapixel images, leveraging streaming convolutional networks within a U-shaped architecture and a divide-and-conquer strategy. MegaSeg enables efficient semantic segmentation of 8192×8192 pixel images (67 MP) without sacrificing detail or structural context while significantly reducing memory usage. Furthermore, we propose the Attentive Dense Refinement Module (ADRM) to effectively retain and improve local details while capturing contextual information present in high-resolution images in the MegaSeg decoder path. Experiments on public histopathology datasets demonstrate superior performance, preserving both global structure and local details. In CAMELYON16, MegaSeg improves the Free Response Operating Characteristic (FROC) score from 0.78 to 0.89 when the input size is scaled from 4 MP to 67 MP, highlighting its effectiveness for large-scale medical image segmentation.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103933"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysisPub Date : 2026-03-01Epub Date: 2026-01-11DOI: 10.1016/j.media.2026.103937
Bolun Zeng , Yaolin Xu , Peng Wang , Tianyu Lu , Zongyu Xie , Mengsu Zeng , Jianjun Zhou , Liang Liu , Haitao Sun , Xiaojun Chen
{"title":"C2HFusion: Clinical context-driven hierarchical fusion of multimodal data for personalized and quantitative prognostic assessment in pancreatic cancer","authors":"Bolun Zeng , Yaolin Xu , Peng Wang , Tianyu Lu , Zongyu Xie , Mengsu Zeng , Jianjun Zhou , Liang Liu , Haitao Sun , Xiaojun Chen","doi":"10.1016/j.media.2026.103937","DOIUrl":"10.1016/j.media.2026.103937","url":null,"abstract":"<div><div>Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive malignancy. Accurate prognostic modeling enables reliable risk stratification to identify patients most likely to benefit from adjuvant therapy, thereby facilitating individualized clinical management and potentially improving patient outcomes. Although recent deep learning approaches have shown promise in this area, their effectiveness is often constrained by fusion strategies that fail to fully capture the hierarchical and complementary information across heterogeneous clinical modalities. To address these limitations, we propose C2HFusion, a novel fusion framework inspired by clinical decision-making for personalized prognostic risk assessment. C2HFusion is unique in that it integrates multimodal data across multiple representational levels and structural forms. At the imaging level, it extracts and aggregates tumor-level features from multi-sequence MRI using cross-attention, effectively capturing complementary imaging patterns. At the patient level, it encodes structured data (e.g., laboratory results, demographics) and unstructured data (e.g., radiology reports) as contextual priors, which are then fused with imaging representations through a novel feature modulation mechanism. To further enhance this cross-level integration, a scalable Mixture-of-Clinical-Experts (MoCE) module dynamically routes different modalities through specialized branches and adaptively optimizes feature fusion for more robust multimodal modeling. Validation on multi-center real-world datasets covering 681 PDAC patients shows that C2HFusion consistently outperforms state-of-the-art methods in overall survival prediction, achieving over a 5% improvement in C-index. These results highlight its potential to improve prognostic accuracy and support more informed, personalized clinical decision-making.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103937"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysisPub Date : 2026-03-01Epub Date: 2026-01-13DOI: 10.1016/j.media.2026.103945
Tobias Rueckert , David Rauber , Raphaela Maerkl , Leonard Klausmann , Suemeyye R. Yildiran , Max Gutbrod , Danilo Weber Nunes , Alvaro Fernandez Moreno , Imanol Luengo , Danail Stoyanov , Nicolas Toussaint , Enki Cho , Hyeon Bae Kim , Oh Sung Choo , Ka Young Kim , Seong Tae Kim , Gonçalo Arantes , Kehan Song , Jianjun Zhu , Junchen Xiong , Christoph Palm
{"title":"Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge","authors":"Tobias Rueckert , David Rauber , Raphaela Maerkl , Leonard Klausmann , Suemeyye R. Yildiran , Max Gutbrod , Danilo Weber Nunes , Alvaro Fernandez Moreno , Imanol Luengo , Danail Stoyanov , Nicolas Toussaint , Enki Cho , Hyeon Bae Kim , Oh Sung Choo , Ka Young Kim , Seong Tae Kim , Gonçalo Arantes , Kehan Song , Jianjun Zhu , Junchen Xiong , Christoph Palm","doi":"10.1016/j.media.2026.103945","DOIUrl":"10.1016/j.media.2026.103945","url":null,"abstract":"<div><div>Reliable recognition and localization of surgical instruments in endoscopic video recordings are foundational for a wide range of applications in computer- and robot-assisted minimally invasive surgery (RAMIS), including surgical training, skill assessment, and autonomous assistance. However, robust performance under real-world conditions remains a significant challenge. Incorporating surgical context – such as the current procedural phase – has emerged as a promising strategy to improve robustness and interpretability.</div><div>To address these challenges, we organized the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) sub-challenge as part of the Endoscopic Vision (EndoVis) challenge at MICCAI 2024. We introduced a novel, multi-center dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three distinct medical institutions, with unified annotations for three interrelated tasks: surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation. Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data while supporting the integration of temporal information across entire procedures.</div><div>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges. The PhaKIR sub-challenge advances the field by providing a unique benchmark for developing temporally aware, context-driven methods in RAMIS and offers a high-quality resource to support future research in surgical scene understanding.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103945"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AEM: An interpretable multi-task multi-modal framework for cardiac disease prediction","authors":"Jiachuan Peng , Marcel Beetz , Abhirup Banerjee , Min Chen , Vicente Grau","doi":"10.1016/j.media.2026.103951","DOIUrl":"10.1016/j.media.2026.103951","url":null,"abstract":"<div><div>Cardiovascular disease (CVD) is one of the leading causes of death and illness across the world. Especially, early prediction of heart failure (HF) is complicated due to the heterogeneity of its clinical presentations and symptoms. These challenges underscore the need for a multidisciplinary approach for comprehensive evaluation of cardiac state. To this end, we specifically select electrocardiogram (ECG) and 3D cardiac anatomy for their complementary coverage of cardiac electrical activities and fine-grained structural modeling. Building upon this, we present a novel pre-training framework, named Anatomy-Electrocardiogram Model (AEM), to explore their complex interactions. AEM adopts a multi-task self-supervised scheme that combines a masked reconstruction objective with a cardiac measurement (CM) regression branch to embed cardiac functional priors and structural details. Unlike image-domain models that typically localize the whole heart within the image, our 3D anatomy is background-free and continuous in 3D space. Hence, the model can naturally concentrate on finer structures at the patch level. The further integration with ECG captures functional dynamics through electrical conduction, encapsulating holistic cardiac representations. Extensive experiments are conducted on the multi-modal datasets collected from the UK Biobank, which contain paired biventricular point cloud anatomy and 12-lead ECG data. Our proposed AEM achieves an area under the receiver operating characteristic curve of 0.8192 for incident HF prediction and a concordance index of 0.6976 for survival prediction under linear evaluation, outperforming the state-of-the-art multi-modal methods. Additionally, we study the interpretability of the disease prediction by observing that our model effectively recognizes clinically plausible patterns and exhibits a high association with clinical features.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103951"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysisPub Date : 2026-03-01Epub Date: 2025-12-11DOI: 10.1016/j.media.2025.103907
Yifan Gao , Yong’ai Li , Xin Gao
{"title":"CIA-net: Cross-modality interaction and aggregation network for ovarian tumor segmentation from multi-modal MRI","authors":"Yifan Gao , Yong’ai Li , Xin Gao","doi":"10.1016/j.media.2025.103907","DOIUrl":"10.1016/j.media.2025.103907","url":null,"abstract":"<div><div>Magnetic resonance imaging (MRI) is an essential examination for ovarian cancer, in which ovarian tumor segmentation is crucial for personalized diagnosis and treatment planning. However, ovarian tumors often present with mixed cystic and solid regions in imaging, posing additional difficulties for automatic segmentation. In clinical practice, radiologists use T2-weighted imaging as the main modality to delineate tumor boundaries. In comparison, multi-modal MRI provides complementary information across modalities that can improve tumor segmentation. Therefore, it is important to fuse salient features from other modalities to the main modality. In this paper, we propose a cross-modality interaction and aggregation network (CIA-Net), a hybrid convolutional and Transformer architecture, for automatic ovarian tumor segmentation from multi-modal MRI. CIA-Net divides multi-modal MRI into one main (T2) and three minor modalities (T1, ADC, DWI), each with independent encoders. The novel cross-modality collaboration block selectively aggregates complementary features from minor modalities into the main modality through a progressive context injection module. Additionally, we introduce the progressive neighborhood integrated module to filter intra- and inter-modality noise and redundancies by refining adjacent slices of each modality. We evaluate our proposed method on a diverse, multi-center ovarian tumor dataset comprising 739 patients, and further validate its generalization and robustness on two public benchmarks for brain and cardiac segmentation. Comparative experiments with other cutting-edge techniques demonstrate the effectiveness of CIA-Net, highlighting its potential to be applied in clinical scenarios.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103907"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation of the right ventricular myocardial infarction in multi-centre cardiac magnetic resonance images","authors":"Chao Xu , Dongaolei An , Chaolu Feng , Zijian Bian , Lian-Ming Wu","doi":"10.1016/j.media.2025.103911","DOIUrl":"10.1016/j.media.2025.103911","url":null,"abstract":"<div><div>Right ventricular myocardial infarction (RVMI) is associated with higher in-hospital morbidity and mortality. Cardiac magnetic resonance (CMR) imaging provides crucial pathological information for diagnosis and/or treatment of RVMI. Segmentation of RVMI in CMR images is significant but challenging. This is because, to the best of our knowledge, there is no publicly available dataset in this field. Furthermore, the severe class imbalance problem caused by mostly less than 0.2 % proportion and the extreme intensity overlap between RVMI and the background bring challenges to the design of segmentation model. Therefore, we release a benchmark CMR dataset, consist of short-axis MR images of 213 subjects from 3 centres acquired by Philips, GE, and Siemens equipments. A multi-stage sequential deep learning model RVMISegNet is proposed to segment RVMI and its related organs at different scales to tackle the class imbalance and intensity overlap problems. In the first stage, transfer learning is employed to localize the right ventricle region. In the second stage, the centroid of the right ventricle guides the extraction of a region of interest, where pseudo-labels are generated to assist a coarse segmentation of myocardial infarction. In the third stage, morphological post-processing is applied, and fine segmentation is performed. Both the coarse and fine segmentation stages use a modified UNet++ backbone, which integrates texture and semantic extraction modules. Extensive experiments validate the state-of-the-art performance of our model and the effectiveness of its constituent modules. The dataset and source codes are available at <span><span>https://github.com/DFLAG-NEU/RVMISegNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103911"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145730677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysisPub Date : 2026-03-01Epub Date: 2026-01-08DOI: 10.1016/j.media.2026.103935
Chaoguang Gong , Lixian Zou , Peng Li , Xingyang Wu , Yangzi Qiao , Zhanqi Hu , Xiaoyan Wang , Yihang Zhou , Kai Wang , Yue Hu , Haifeng Wang
{"title":"Rapid spatio-temporal MR fingerprinting using physics-informed implicit neural representation","authors":"Chaoguang Gong , Lixian Zou , Peng Li , Xingyang Wu , Yangzi Qiao , Zhanqi Hu , Xiaoyan Wang , Yihang Zhou , Kai Wang , Yue Hu , Haifeng Wang","doi":"10.1016/j.media.2026.103935","DOIUrl":"10.1016/j.media.2026.103935","url":null,"abstract":"<div><div>The potential of Magnetic Resonance Fingerprinting (MRF), which allows for rapid and simultaneous multi-parametric quantitative MRI, is often limited by severe aliasing artifacts caused by aggressive undersampling. Conventional MRF approaches typically treat these artifacts as detrimental noise and focus on their removal, often at the cost of either reduced reconstruction speed or increased reliance on large training datasets. Building on the insight that structured aliasing can be leveraged as an informative spatial encoding mechanism, we propose to extend MRF’s encoding capacity to the global spatio-temporal domain by introducing a novel Physics-informed implicit neural MRF (<em>π</em>MRF) framework. <em>π</em>MRF integrates physics-informed spatio-temporal fingerprint modeling with implicit neural representations (INRs), enabling unsupervised, gradient-driven joint estimation of quantitative tissue parameters and coil sensitivity maps (CSMs) with enhanced accuracy and robustness. Specifically, <em>π</em>MRF leverages a scalable component based on physics-informed neural networks (PINNs) to facilitate accurate high-dimensional signal modeling and memory-efficient optimization. In addition, a subspace-guided sensitivity regularization is developed to improve the robustness of CSM estimation in highly undersampled scenarios. Experimental results on simulated, phantom, and <em>in vivo</em> datasets demonstrate that <em>π</em>MRF achieves improved quantitative accuracy and robustness even under highly accelerated acquisitions, outperforming state-of-the-art MRF methods.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103935"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysisPub Date : 2026-03-01Epub Date: 2026-01-11DOI: 10.1016/j.media.2026.103934
Jungwook Lee , Xuanang Xu , Daeseung Kim , Tianshu Kuang , Hannah H. Deng , Xinrui Song , Yasmine Soubra , Michael A.K. Liebschner , Jaime Gateno , Pingkun Yan
{"title":"Facial appearance prediction for orthognathic surgery with diffusion models","authors":"Jungwook Lee , Xuanang Xu , Daeseung Kim , Tianshu Kuang , Hannah H. Deng , Xinrui Song , Yasmine Soubra , Michael A.K. Liebschner , Jaime Gateno , Pingkun Yan","doi":"10.1016/j.media.2026.103934","DOIUrl":"10.1016/j.media.2026.103934","url":null,"abstract":"<div><div>Orthognathic surgery corrects craniomaxillofacial deformities by repositioning skeletal structures to improve facial aesthetics and function. Conventional orthognathic surgical planning is largely bone-driven, where bone repositioning is first defined and soft-tissue outcomes are predicted. However, this is limited by its reliance on surgeon-defined bone plans and the inability to directly optimize for patient-specific aesthetic outcomes. To address these limitations, the soft-tissue-driven paradigm seeks to first predict a patient-specific optimal facial appearance and subsequently derive the skeletal changes required to achieve it. In this work, we introduce FAPOS (Facial Appearance Prediction for Orthognathic Surgery), a novel transformer-based latent diffusion framework that directly predicts a normal-looking 3D facial outcome from pre-operative scans to allow soft-tissue driven planning. FAPOS utilizes a dense 282-landmark representation and is trained on a combined dataset of 44,602 public 3D faces, overcoming limitations of data scarcity, lack of correspondence. Our three-phase training pipeline combines geometric encoding, latent diffusion modeling, and patient-specific conditioning. Quantitative and qualitative results show that FAPOS outperforms prior methods with improved facial symmetry and identity preservation. These results mark an important step toward enabling soft-tissue-driven surgical planning, with FAPOS providing an optimal facial target that serves as the basis for estimating the skeletal adjustments in subsequent stages.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103934"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145956956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image analysisPub Date : 2026-03-01Epub Date: 2025-12-23DOI: 10.1016/j.media.2025.103921
Xiao Chen , Pang Lyu , Wencheng Han , Liyang Yang , Yi Jin , Teng Zhang , Jianbing Shen
{"title":"MPLDM: Multi-modal prosthetic loosening diagnostic model for total hip arthroplasty","authors":"Xiao Chen , Pang Lyu , Wencheng Han , Liyang Yang , Yi Jin , Teng Zhang , Jianbing Shen","doi":"10.1016/j.media.2025.103921","DOIUrl":"10.1016/j.media.2025.103921","url":null,"abstract":"<div><div>Total hip arthroplasty (THA) is an effective procedure for restoring hip joint function and typically yields satisfactory clinical outcomes. Aseptic loosening and periprosthetic joint infection (PJI) are severe complications following total hip arthroplasty. Accurate diagnosis of these complications requires the integration of various clinical data, including X-ray images, CT scans, medical records, and laboratory test results. To efficiently utilize these critical data and robustly diagnose postoperative complications, we present the Multi-modal Prosthetic Loosening Diagnostic Model (MPLDM). MPLDM employs four independent encoders to convert the four types of data into unified feature tokens. To enhance the interaction between relevant modalities, we introduce three types of attention mechanisms for focusing on the important information. Finally, a multi-modal fusion module combines the information from these modalities to produce a robust prediction. Notably, our model is designed to generalize well even when some input modalities are missing, accommodating patients who cannot provide certain types of data. To effectively train and evaluate the proposed model, we propose a multi-modal dataset for diagnosing prosthetic loosening after THA. This dataset includes multi-modal input data for all four modalities, along with diagnostic results from senior clinical doctors. The MPLDM showed excellent performance with mean precision, recall and F1-score of 0.9140, 0.8353 and 0.8645, respectively. Experiments demonstrate that, compared to single-modal and other multi-modal baselines, our method shows superior accuracy and robustness in diagnosing both aseptic loosening and PJI-induced loosening.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"109 ","pages":"Article 103921"},"PeriodicalIF":11.8,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145823649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}