{"title":"Calibration-informed metrics for instance-level predictive reliability in medical AI","authors":"Federico Cabitza","doi":"10.1016/j.artmed.2026.103366","DOIUrl":"10.1016/j.artmed.2026.103366","url":null,"abstract":"<div><div>Conventional performance metrics in clinical decision support systems, such as accuracy or sensitivity, fail to reflect the reliability of individual predictions—an essential concern for clinicians operating in high-stakes environments. We introduce a calibration-informed framework featuring two novel metrics: the Local Predictive Value (LPV) and the Credible Predictive Value (CPV). LPV estimates the empirical reliability of a prediction by assessing the observed correctness frequency in the neighborhood of its confidence score. CPV refines this estimate using a Bayesian approach, integrating global predictive values as priors to produce a posterior distribution over correctness probabilities. LPV offers a descriptive, data-driven view of local reliability, while CPV provides a belief-adjusted estimate that mitigates overfitting to sparse local data. Applied to benchmark medical imaging datasets, these metrics yielded locally adaptive, interpretable reliability estimates. Divergences between LPV and CPV identified cases where local evidence was insufficient or misleading, highlighting how Bayesian smoothing improves stability against sparse or misleading local evidence. By combining local calibration with Bayesian inference, LPV and CPV advance the development of medical AI systems that are not only accurate but also interpretable and trustworthy at the individual case level.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103366"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mats Tveter , Thomas Tveitstøl , Christoffer Hatlestad-Hall , Hugo L. Hammer , Ira R.J. Hebold Haraldsen
{"title":"Uncertainty in deep learning for EEG under dataset shifts","authors":"Mats Tveter , Thomas Tveitstøl , Christoffer Hatlestad-Hall , Hugo L. Hammer , Ira R.J. Hebold Haraldsen","doi":"10.1016/j.artmed.2026.103374","DOIUrl":"10.1016/j.artmed.2026.103374","url":null,"abstract":"<div><div>As artificial intelligence (AI) is increasingly integrated into medical diagnostics, it is essential that predictive models provide not only accurate outputs but also reliable estimates of uncertainty. In clinical applications, where decisions have significant consequences, understanding the confidence behind each prediction is as critical as the prediction itself. Uncertainty modelling plays a key role in improving trust, guiding decision-making, and identifying unreliable outputs, particularly under dataset shift or in out-of-distribution settings. The primary aim of uncertainty metrics is to align model confidence closely with actual predictive performance, ensuring confidence estimates dynamically adjust to reflect increasing errors or decreasing reliability of predictions. This study investigates how different ensemble learning strategies affect both performance and uncertainty estimation in a clinically relevant task: classifying Normal, Mild Cognitive Impairment, and Dementia from electroencephalography (EEG) data. We evaluated the performance and uncertainty of ensemble methods and Monte Carlo dropout on a large EEG dataset. The models were assessed in three settings: (1) in-distribution performance on a held-out test set, (2) generalisation to three out-of-distribution datasets, and (3) performance under gradual, EEG-specific dataset shifts simulating noise, drift, and frequency perturbation. Ensembles consisting of multiple independently trained models, such as deep ensembles, consistently achieved higher performance in both the in-distribution test set and the out-of-distribution datasets. These models also produced more informative and reliable uncertainty estimates under various types of EEG dataset shifts. These results highlight the benefits of ensemble diversity and independent training to build robust and uncertainty-aware EEG classification models. The findings are particularly relevant for clinical applications, where reliability under distribution shift and transparent uncertainty are essential for safe deployment.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103374"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146133538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mitigating data center bias in cancer classification: Transfer bias unlearning and feature size reduction via conflict-of-interest free multi-objective optimization","authors":"Farnaz Kheiri , Shahryar Rahnamayan , Masoud Makrehchi","doi":"10.1016/j.artmed.2026.103351","DOIUrl":"10.1016/j.artmed.2026.103351","url":null,"abstract":"<div><div>Bias in the decision-making processes of trained deep models poses a significant threat to their reliability. Such bias can lead to overoptimistic results on observed data while compromising generalization to unseen datasets. Training data may contain hidden patterns related to task-irrelevant attributes, such as data centers, causing models to exploit these unintended correlations rather than learning the main task. This results in biased predictions that favor certain attributes. To address this issue, we propose an unlearning approach based on Conflict-of-Interest-Free Multi-Objective Optimization, designed to train an unlearning layer that explicitly reduces reliance on irrelevant patterns. Our method aims to minimize the gap between internal accuracy (evaluated on data centers seen during training) and external accuracy (evaluated on entirely unseen data centers) caused by biased model behavior. As a case study, we investigate how data center-specific signatures embedded in cancerous features can lead to misleadingly high internal performance and a significant drop in performance on test samples from external data centers. By evaluating various methods and objective functions, our proposed approach achieves strong generalizability on external validation data by jointly reducing feature dimensionality and excluding conflict-of-interest samples during the <span><math><mi>k</mi></math></span>-Nearest Neighbor (KNN) searching process. We compare our method against multi-task and adversarial learning approaches for bias mitigation. Results show that our method outperforms others in narrowing the internal-external performance gap while also improving external validation accuracy. To ensure robustness, we conducted experiments using k-fold cross-validation across k different data centers, further validating the generalizability of our approach. Although this study focuses on cancer-related features and data center biases, the proposed method is model-agnostic and can be applied to any biased feature set extracted by a deep learning model.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103351"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145993609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feng Tian , Xian Wang , Saicong Lu , Jiaxuan Gu , Shitao Zhou , Penghui Li , Zhen Wang , Zengjun Jin , Feifei Zhao , Zhenjie Yang , Longqiang Zhang , Tingting Zhao , Haifang Zhang
{"title":"PreLora: A fine-tuning approach with low-rank matrix decomposition and prefix tuning for pre-hospital emergency text classification","authors":"Feng Tian , Xian Wang , Saicong Lu , Jiaxuan Gu , Shitao Zhou , Penghui Li , Zhen Wang , Zengjun Jin , Feifei Zhao , Zhenjie Yang , Longqiang Zhang , Tingting Zhao , Haifang Zhang","doi":"10.1016/j.artmed.2026.103364","DOIUrl":"10.1016/j.artmed.2026.103364","url":null,"abstract":"<div><h3>Objective</h3><div>With expanding applications of artificial intelligence technology in the medical field, Large Language Models (LLMs) have achieved substantial success in medical text processing. However, there remain a number of challenges in effectively adapting to specific tasks, such as pre-hospital emergency text classification.</div></div><div><h3>Methods</h3><div>We propose a novel fine-tuning method PreLora, which combines prefix tuning with matrix low-rank decomposition. First, this approach incorporates task-specific prompts based on multi-layer perceptron (MLP) encoder into the input. Then, it inject trainable rank-decomposed matrices into every layer of the transformer architecture to compress model parameters, reduce the number of parameters, and capture correlations among the input. To validate its efficacy, we carried out a comparative validation on a pre-hospital emergency text dataset.</div></div><div><h3>Results</h3><div>Comparison results indicated that the model fine-tuned with PreLora outperformed the baseline models without fine-tuning, achieving a performance improvement of 45.4%–75.4%. Moreover, PreLora ranked first among all fine-tuning methods across each LLM evaluated. An in-depth performance analysis was further conducted on 21 ICD-10 categories with distinct semantic features. The results revealed a negative correlation between model performance and semantic similarity of ICD-10 categories: the low similarity groups performed better, while the high similarity groups performed worse. Notably, PreLora consistently maintained robust performance, with a smaller performance decline in high-similarity categories compared to other fine-tuning methods. In the classifying complex cases with high semantic similarity, PreLora still showed superior adaptability, improving by 68.6%–95.8% compared to the baseline model and 0.4%–8.4% compared to other fine-tuning methods.</div></div><div><h3>Conclusion</h3><div>This study demonstrates PreLora is an effective fine-tuning method to process pre-hospital emergency text classification. It has the potential to expand to other mainstream models for adapting specific tasks in the medical field.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103364"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146168121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comprehensive review of heart disease prediction: A comparative study from 2019 onwards","authors":"Monali Gulhane , Sandeep Kumar , Shilpa Choudhary , Nitin Rakesh , Narendra Khatri , Chanderdeep Tandon , Balamurugan Balusamy , Anand Nayyar","doi":"10.1016/j.artmed.2026.103354","DOIUrl":"10.1016/j.artmed.2026.103354","url":null,"abstract":"<div><div>In recent decades, cardiovascular disease, or heart disease, has been the number one cause of death worldwide, establishing an urgent need for timely and accurate early diagnosis. The primary purpose of this review is to examine the current state of the art in heart disease prediction, addressing a shift from traditional diagnostic techniques to modern machine learning and deep learning methods, while maintaining a systematic and comprehensive approach. A critical review of the literature is conducted to assess the effectiveness and limitations of various predictive algorithms. This approach provides historical context, highlights outstanding research needs, and presents recent advancements. The review provides a comprehensive assessment of the challenges in predicting heart disease, which includes both the identification of specific risk factors and non-linear interactions between selected factors. The study also examines how the relationship between CVDs and kidney stones can influence the development of predictive models in the future. In conclusion, this study summarizes its key findings in a defined roadmap for future research, emphasizing the potential benefits of applying deep learning methods to enhance diagnostic precision and thus optimize patient management and outcomes.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103354"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianbin Chen , Yongbin Zeng , Jinlin Wang , Xiao Sun , Sihao Liu , Ya Fu , Qiang Yi , Qishui Ou , Kai Yan , Zhiheng Zhou
{"title":"Double Graph Attention Network for predicting non-alcoholic fatty liver disease in patients with type 2 diabetes","authors":"Tianbin Chen , Yongbin Zeng , Jinlin Wang , Xiao Sun , Sihao Liu , Ya Fu , Qiang Yi , Qishui Ou , Kai Yan , Zhiheng Zhou","doi":"10.1016/j.artmed.2026.103369","DOIUrl":"10.1016/j.artmed.2026.103369","url":null,"abstract":"<div><div>Type 2 diabetes mellitus (T2DM) is a chronic metabolic disease, while non-alcoholic fatty liver disease (NAFLD) is the most prevalent chronic liver disease, which can progress to more severe liver diseases such as liver fibrosis, cirrhosis and hepatocellular carcinoma. Approximately 50%–70% of T2DM patients also have NAFLD. Traditional diagnostic methods like liver biopsy have limitations, making large-scale screening difficult. In the past decade, machine learning have emerged as crucial tools for assisting in NAFLD diagnosis. In this paper, we propose a novel Dual Graph Attention Network (DGAN) for diagnosing NAFLD in T2DM patients. We model the NAFLD diagnosis problem as a node classification task on graph by using features similarity constructed graph. The model includes a Feature Attention Module to capture feature importance through a feature graph and a Patient Attention Module to evaluate patient importance using graph attention mechanisms. These components enhance the model’s classification accuracy by leveraging both feature and topological information. The model was trained and tested on clinical data from 2402 T2DM patients, demonstrating superior accuracy in identifying NAFLD compared to other models.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103369"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zongbao Yang , Yuchen Lin , Yichen He , Jinlong Hu , Ruxin Wang , Hao Zhang , Shoubin Dong
{"title":"IKDP: Implicit Knowledge Enhanced Disease Prediction via heterogeneous admission sequence graphs","authors":"Zongbao Yang , Yuchen Lin , Yichen He , Jinlong Hu , Ruxin Wang , Hao Zhang , Shoubin Dong","doi":"10.1016/j.artmed.2026.103365","DOIUrl":"10.1016/j.artmed.2026.103365","url":null,"abstract":"<div><div>Despite significant advances in deep learning for electronic health record (EHR) modeling, accurately representing complex disease relationships and admission trajectories remains challenging. Current approaches that leverage external knowledge graphs to learn patient representations are often limited by incomplete knowledge coverage. Furthermore, these methods frequently overlook implicit information within patient data, such as inter-patient similarities and latent disease correlations, and often discard patients with only a single admission, thereby losing valuable clinical insights. To address these limitations, we introduce the <strong>I</strong>mplicit <strong>K</strong>nowledge Enhanced <strong>D</strong>isease <strong>P</strong>rediction model (IKDP) via heterogeneous admission sequence graphs (SeqGs), which harnesses implicit knowledge from comprehensive patient admission data. IKDP integrates an auxiliary pre-training strategy with end-to-end optimization to effectively process multi-dimensional patient data and compute inter-patient similarities as complementary knowledge. Specifically, the model constructs SeqGs for each patient, which capture complex disease dependencies and the dynamic evolution of health status. Moreover, critical paths extracted from the SeqGs, combined with similar patient analysis and historical admission records, are utilized to elucidate the reasoning behind predictions. The code is available at <span><span>https://github.com/SCUT-CCNL/IKDP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103365"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146090581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qince Li , Yang Liu , Na Zhao , Yongfeng Yuan , Runnan He
{"title":"A novel ECG QRS complex detection algorithm based on dynamic Bayesian network","authors":"Qince Li , Yang Liu , Na Zhao , Yongfeng Yuan , Runnan He","doi":"10.1016/j.artmed.2026.103370","DOIUrl":"10.1016/j.artmed.2026.103370","url":null,"abstract":"<div><div>Accurate detection of the QRS complex, a crucial reference for heartbeat localization in electrocardiogram (ECG) signals, remains inadequate in wearable ECG devices due to complex noise interference. In this study, we propose a novel QRS complex detection method based on dynamic Bayesian network (DBN), integrating the probability distribution of RR intervals. Unlike methods focusing solely on ECG waveforms, our approach explicitly integrates ECG waveform and heart rhythm information into a unified probability model, enhancing noise robustness. Additionally, an unsupervised parameter optimization using expectation maximization (EM) adapts to individual differences of patients. Furthermore, several simplification strategies improve reasoning efficiency, and an online detection mode enables real-time applications. Our method outperforms other state-of-the-art QRS detection methods, including deep learning (DL) methods, on noisy datasets. In conclusion, the proposed DBN-based QRS detection algorithm demonstrates outstanding accuracy, noise robustness, generalization ability, real-time capability, and strong scalability, indicating its potential application in wearable ECG devices.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103370"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146138139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gernot Fiala , Markus Plass , Robert Harb , Peter Regitnig , Kristijan Skok , Wael Al Zoughbi , Carmen Zerner , Paul Torke , Michaela Kargl , Heimo Müller , Tomas Brazdil , Matej Gallo , Jaroslav Kubín , Roman Stoklasa , Rudolf Nenutil , Norman Zerbe , Andreas Holzinger , Petr Holub
{"title":"From slides to AI-ready maps: Standardized multi-layer tissue maps as metadata for artificial intelligence in digital pathology","authors":"Gernot Fiala , Markus Plass , Robert Harb , Peter Regitnig , Kristijan Skok , Wael Al Zoughbi , Carmen Zerner , Paul Torke , Michaela Kargl , Heimo Müller , Tomas Brazdil , Matej Gallo , Jaroslav Kubín , Roman Stoklasa , Rudolf Nenutil , Norman Zerbe , Andreas Holzinger , Petr Holub","doi":"10.1016/j.artmed.2026.103368","DOIUrl":"10.1016/j.artmed.2026.103368","url":null,"abstract":"<div><div>A Whole Slide Image (WSI) is a high-resolution digital image created by scanning an entire glass slide containing a biological specimen, such as tissue sections or cell samples, at multiple magnifications. These images are digitally viewable, analyzable, and shareable, and are widely used for Artificial Intelligence (AI) algorithm development. WSIs play an important role in pathology for disease diagnosis and oncology for cancer research, but are also applied in neurology, veterinary medicine, hematology, microbiology, dermatology, pharmacology, toxicology, immunology, and forensic science.</div><div>When assembling cohorts for AI training or validation, it is essential to know the content of a WSI. However, no standard currently exists for this metadata, and such a selection has largely relied on manual inspection, which is not suitable for large collections with millions of objects.</div><div>We propose a general framework to generate 2D index maps (tissue maps) that describe the morphological content of WSIs using common syntax and semantics to achieve interoperability between catalogs. The tissue maps are structured in three layers: source, tissue type, and pathological alterations. Each layer assigns WSI segments to specific classes, providing AI-ready metadata.</div><div>We demonstrate the advantages of this standard by applying AI-based metadata extraction from WSIs to generate tissue maps and integrating them into a WSI archive. This integration enhances search capabilities within WSI archives, thereby facilitating the accelerated assembly of high-quality, balanced, and more targeted datasets for AI training, validation, and cancer research.</div></div>","PeriodicalId":55458,"journal":{"name":"Artificial Intelligence in Medicine","volume":"174 ","pages":"Article 103368"},"PeriodicalIF":6.2,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}