Testing the real-world utility of Bayes theorem in artificial intelligence-enabled electrocardiogram algorithm for the detection of left ventricular systolic dysfunction
Betsy J. Medina-Inojosa , David M. Harmon , Jose R. Medina-Inojosa , Rickey E. Carter , Itzhak Zachi Attia , Paul A. Friedman , Francisco Lopez-Jimenez
{"title":"Testing the real-world utility of Bayes theorem in artificial intelligence-enabled electrocardiogram algorithm for the detection of left ventricular systolic dysfunction","authors":"Betsy J. Medina-Inojosa , David M. Harmon , Jose R. Medina-Inojosa , Rickey E. Carter , Itzhak Zachi Attia , Paul A. Friedman , Francisco Lopez-Jimenez","doi":"10.1016/j.ibmed.2025.100238","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To assess how the theoretical principles of Bayes' theorem hold true in a clinically impactful way when testing the diagnostic performance of an artificial intelligence (AI) tool, using the case of the AI-enabled electrocardiogram (AI-ECG) screening tool that detects left ventricular systolic dysfunction (LVSD) in a “real-world” setting.</div></div><div><h3>Patient and methods</h3><div>We analyzed data from 42,883 consecutive patients who underwent a clinically indicated ECG and an echocardiogram within two weeks at our center between January 1st and December 31st<sup>,</sup> 2019. We then evaluated area under the curve (AUC) of the receiver operating characteristics, sensitivity, specificity, positive and negative predictive values (PPV and NPV) of the AI-ECG to detect LVSD (left ventricle ejection fraction of ≤40 %) across (i) cumulative risk factor prevalence (pre-test probabilities) (ii) different diagnostic thresholds, using paired ECG-echocardiogram data.</div></div><div><h3>Results</h3><div>Prevalence of LVSD was 1.9 %, 4.0 %, 7.0 % and 13.9 % for patients with 0, 1–2, 3–4 and ≥5 risk-factors for LVSD. The AUC of the AI-ECG for each group was 0.955, 0.933, 0.901 and 0.886, respectively (p for trend<0.001). Pre-test probabilities hardly influenced sensitivity but did impact specificity. PPV was affected more than NPV, which was modestly altered. Thresholds impacted diagnostic performance parameters, although their effect on NPV at low pre-test probability was negligible.</div></div><div><h3>Conclusion</h3><div>In real world, pre-test probabilities/cumulative risk-factors of disease do affect specificity. Using different diagnostic thresholds yields the highest impact on algorithm performance. A Bayesian approach may enhance individualized diagnostic performance when implementing AI algorithms.</div></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"11 ","pages":"Article 100238"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521225000420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
To assess how the theoretical principles of Bayes' theorem hold true in a clinically impactful way when testing the diagnostic performance of an artificial intelligence (AI) tool, using the case of the AI-enabled electrocardiogram (AI-ECG) screening tool that detects left ventricular systolic dysfunction (LVSD) in a “real-world” setting.
Patient and methods
We analyzed data from 42,883 consecutive patients who underwent a clinically indicated ECG and an echocardiogram within two weeks at our center between January 1st and December 31st, 2019. We then evaluated area under the curve (AUC) of the receiver operating characteristics, sensitivity, specificity, positive and negative predictive values (PPV and NPV) of the AI-ECG to detect LVSD (left ventricle ejection fraction of ≤40 %) across (i) cumulative risk factor prevalence (pre-test probabilities) (ii) different diagnostic thresholds, using paired ECG-echocardiogram data.
Results
Prevalence of LVSD was 1.9 %, 4.0 %, 7.0 % and 13.9 % for patients with 0, 1–2, 3–4 and ≥5 risk-factors for LVSD. The AUC of the AI-ECG for each group was 0.955, 0.933, 0.901 and 0.886, respectively (p for trend<0.001). Pre-test probabilities hardly influenced sensitivity but did impact specificity. PPV was affected more than NPV, which was modestly altered. Thresholds impacted diagnostic performance parameters, although their effect on NPV at low pre-test probability was negligible.
Conclusion
In real world, pre-test probabilities/cumulative risk-factors of disease do affect specificity. Using different diagnostic thresholds yields the highest impact on algorithm performance. A Bayesian approach may enhance individualized diagnostic performance when implementing AI algorithms.