用可解释的机器学习预测艰难梭菌感染的结果。

IF 9.7 1区医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

EBioMedicine Pub Date : 2024-08-01 Epub Date: 2024-07-17 DOI:10.1016/j.ebiom.2024.105244

Gregory R Madden, Rachel H Boone, Emmanuel Lee, Costi D Sifri, William A Petri

{"title":"用可解释的机器学习预测艰难梭菌感染的结果。","authors":"Gregory R Madden, Rachel H Boone, Emmanuel Lee, Costi D Sifri, William A Petri","doi":"10.1016/j.ebiom.2024.105244","DOIUrl":null,"url":null,"abstract":"Background: Clostridioides difficile infection results in life-threatening short-term outcomes and the potential for subsequent recurrent infection. Predicting these outcomes at diagnosis, when important clinical decisions need to be made, has proven to be a difficult task.Methods: 52 clinical features from existing models or the literature were collected retrospectively within ±48 h of diagnosis among 1660 inpatient infections. A modified desirability of outcome ranking (DOOR) was designed to encompass clinically-important severe events attributable to the acute infection (intensive care transfer due to sepsis, shock, colectomy/ileostomy, mortality) and/or 60-day recurrence. A deep neural network was constructed and interpreted using SHapley Additive exPlanations (SHAP). High-importance features were used to train a reduced, shallow network and performance was compared to existing conventional models (7 severity, 7 recurrence; after summing DOOR probabilities to align with conventional binary outputs) using area under the ROC curve (AUROC) and DeLong tests.Findings: The full (52-feature) model achieved an out-of-sample AUROC 0.823 for severity and 0.678 for recurrence. SHAP identified 13 unique, highly-important features (age, hypotension, initial treatment, onset, PCR cycle threshold, number of prior episodes, antibiotic exposure, fever, hypotension, pressors, leukocytosis, creatinine, lactate) that were used to train a reduced model, which performed similarly to the full model (severity AUROC difference P = 0.130; recurrence P = 0.426) and significantly better than the top severity model (reduced model predicting severity 0.837, ATLAS 0.749; P = 0.001). The reduced model also outperformed the top recurrence model, but this was not statistically-significant (reduced model recurrence AUROC 0.653, IDSA Recurrence Risk Criteria 0.595; P = 0.196). The final, reduced model was deployed as a web application with real-time SHAP explanations.Interpretation: Our final model outperformed existing severity and recurrence models; however, it requires external validation. A DOOR output allows specific clinical questions to be asked with explainable predictions that can be feasibly implemented with limited computing resources.Funding: National Institutes of Health-Institute of Allergy and Infectious Diseases.","PeriodicalId":11494,"journal":{"name":"EBioMedicine","volume":null,"pages":null},"PeriodicalIF":9.7000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11286990/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predicting Clostridioides difficile infection outcomes with explainable machine learning.\",\"authors\":\"Gregory R Madden, Rachel H Boone, Emmanuel Lee, Costi D Sifri, William A Petri\",\"doi\":\"10.1016/j.ebiom.2024.105244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Clostridioides difficile infection results in life-threatening short-term outcomes and the potential for subsequent recurrent infection. Predicting these outcomes at diagnosis, when important clinical decisions need to be made, has proven to be a difficult task.Methods: 52 clinical features from existing models or the literature were collected retrospectively within ±48 h of diagnosis among 1660 inpatient infections. A modified desirability of outcome ranking (DOOR) was designed to encompass clinically-important severe events attributable to the acute infection (intensive care transfer due to sepsis, shock, colectomy/ileostomy, mortality) and/or 60-day recurrence. A deep neural network was constructed and interpreted using SHapley Additive exPlanations (SHAP). High-importance features were used to train a reduced, shallow network and performance was compared to existing conventional models (7 severity, 7 recurrence; after summing DOOR probabilities to align with conventional binary outputs) using area under the ROC curve (AUROC) and DeLong tests.Findings: The full (52-feature) model achieved an out-of-sample AUROC 0.823 for severity and 0.678 for recurrence. SHAP identified 13 unique, highly-important features (age, hypotension, initial treatment, onset, PCR cycle threshold, number of prior episodes, antibiotic exposure, fever, hypotension, pressors, leukocytosis, creatinine, lactate) that were used to train a reduced model, which performed similarly to the full model (severity AUROC difference P = 0.130; recurrence P = 0.426) and significantly better than the top severity model (reduced model predicting severity 0.837, ATLAS 0.749; P = 0.001). The reduced model also outperformed the top recurrence model, but this was not statistically-significant (reduced model recurrence AUROC 0.653, IDSA Recurrence Risk Criteria 0.595; P = 0.196). The final, reduced model was deployed as a web application with real-time SHAP explanations.Interpretation: Our final model outperformed existing severity and recurrence models; however, it requires external validation. A DOOR output allows specific clinical questions to be asked with explainable predictions that can be feasibly implemented with limited computing resources.Funding: National Institutes of Health-Institute of Allergy and Infectious Diseases.\",\"PeriodicalId\":11494,\"journal\":{\"name\":\"EBioMedicine\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11286990/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EBioMedicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ebiom.2024.105244\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/7/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EBioMedicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ebiom.2024.105244","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

摘要

背景：难辨梭状芽孢杆菌感染会导致危及生命的短期后果，并有可能导致后续的复发性感染。方法：在 1660 例住院患者中，回顾性收集了诊断后±48 小时内现有模型或文献中的 52 个临床特征。设计了改良的结果可取性排序（DOOR），以涵盖临床上重要的急性感染严重事件（脓毒症导致的重症监护转院、休克、结肠切除术/回肠造口术、死亡率）和/或 60 天复发。使用 SHapley Additive exPlanations（SHAP）构建并解释了深度神经网络。使用 ROC 曲线下面积 (AUROC) 和 DeLong 检验将高重要性特征用于训练一个缩小的浅层网络，并将其性能与现有的传统模型（7 个严重性、7 个复发；将 DOOR 概率相加后与传统的二进制输出相一致）进行比较：完整（52 个特征）模型的严重性样本外 AUROC 值为 0.823，复发率样本外 AUROC 值为 0.678。SHAP 发现了 13 个独特的、非常重要的特征（年龄、低血压、初始治疗、发病、PCR 周期阈值、既往发病次数、抗生素暴露、发热、低血压、加压素、白细胞增多、肌酐、乳酸），并将其用于训练简化模型，该模型的表现与完整模型相似（严重性 AUROC 差异 P = 0.130；复发 P = 0.426），明显优于最高严重程度模型（简化模型预测严重程度为 0.837，ATLAS 为 0.749；P = 0.001）。简化模型的预测结果也优于顶级复发模型，但在统计学上并不显著（简化模型的复发 AUROC 为 0.653，IDSA 复发风险标准为 0.595；P = 0.196）。最终的简化模型以网络应用的形式部署，并提供实时的 SHAP 解释：我们的最终模型优于现有的严重程度和复发模型，但还需要外部验证。DOOR输出允许提出具体的临床问题，并提供可解释的预测，可以在有限的计算资源条件下实施：美国国立卫生研究院-过敏与传染病研究所。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predicting Clostridioides difficile infection outcomes with explainable machine learning.

Background: Clostridioides difficile infection results in life-threatening short-term outcomes and the potential for subsequent recurrent infection. Predicting these outcomes at diagnosis, when important clinical decisions need to be made, has proven to be a difficult task.

Methods: 52 clinical features from existing models or the literature were collected retrospectively within ±48 h of diagnosis among 1660 inpatient infections. A modified desirability of outcome ranking (DOOR) was designed to encompass clinically-important severe events attributable to the acute infection (intensive care transfer due to sepsis, shock, colectomy/ileostomy, mortality) and/or 60-day recurrence. A deep neural network was constructed and interpreted using SHapley Additive exPlanations (SHAP). High-importance features were used to train a reduced, shallow network and performance was compared to existing conventional models (7 severity, 7 recurrence; after summing DOOR probabilities to align with conventional binary outputs) using area under the ROC curve (AUROC) and DeLong tests.

Findings: The full (52-feature) model achieved an out-of-sample AUROC 0.823 for severity and 0.678 for recurrence. SHAP identified 13 unique, highly-important features (age, hypotension, initial treatment, onset, PCR cycle threshold, number of prior episodes, antibiotic exposure, fever, hypotension, pressors, leukocytosis, creatinine, lactate) that were used to train a reduced model, which performed similarly to the full model (severity AUROC difference P = 0.130; recurrence P = 0.426) and significantly better than the top severity model (reduced model predicting severity 0.837, ATLAS 0.749; P = 0.001). The reduced model also outperformed the top recurrence model, but this was not statistically-significant (reduced model recurrence AUROC 0.653, IDSA Recurrence Risk Criteria 0.595; P = 0.196). The final, reduced model was deployed as a web application with real-time SHAP explanations.

Interpretation: Our final model outperformed existing severity and recurrence models; however, it requires external validation. A DOOR output allows specific clinical questions to be asked with explainable predictions that can be feasibly implemented with limited computing resources.

Funding: National Institutes of Health-Institute of Allergy and Infectious Diseases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

EBioMedicine Biochemistry, Genetics and Molecular Biology-General Biochemistry,Genetics and Molecular Biology

CiteScore

17.70

自引率

0.90%

发文量

579

审稿时长

5 weeks

期刊介绍： eBioMedicine is a comprehensive biomedical research journal that covers a wide range of studies that are relevant to human health. Our focus is on original research that explores the fundamental factors influencing human health and disease, including the discovery of new therapeutic targets and treatments, the identification of biomarkers and diagnostic tools, and the investigation and modification of disease pathways and mechanisms. We welcome studies from any biomedical discipline that contribute to our understanding of disease and aim to improve human health.