Development and Internal Validation of an Interpretable Machine Learning Model to Predict Readmissions in a United States Healthcare System

IF 2.8 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Informatics Pub Date : 2023-03-27 DOI:10.3390/informatics10020033

Amanda L. Luo, Akshay Ravi, Simone Arvisais-Anhalt, Anoop Muniyappa, Xinran Liu, Sha Wang

{"title":"Development and Internal Validation of an Interpretable Machine Learning Model to Predict Readmissions in a United States Healthcare System","authors":"Amanda L. Luo, Akshay Ravi, Simone Arvisais-Anhalt, Anoop Muniyappa, Xinran Liu, Sha Wang","doi":"10.3390/informatics10020033","DOIUrl":null,"url":null,"abstract":"(1) One in four hospital readmissions is potentially preventable. Machine learning (ML) models have been developed to predict hospital readmissions and risk-stratify patients, but thus far they have been limited in clinical applicability, timeliness, and generalizability. (2) Methods: Using deidentified clinical data from the University of California, San Francisco (UCSF) between January 2016 and November 2021, we developed and compared four supervised ML models (logistic regression, random forest, gradient boosting, and XGBoost) to predict 30-day readmissions for adults admitted to a UCSF hospital. (3) Results: Of 147,358 inpatient encounters, 20,747 (13.9%) patients were readmitted within 30 days of discharge. The final model selected was XGBoost, which had an area under the receiver operating characteristic curve of 0.783 and an area under the precision-recall curve of 0.434. The most important features by Shapley Additive Explanations were days since last admission, discharge department, and inpatient length of stay. (4) Conclusions: We developed and internally validated a supervised ML model to predict 30-day readmissions in a US-based healthcare system. This model has several advantages including state-of-the-art performance metrics, the use of clinical data, the use of features available within 24 h of discharge, and generalizability to multiple disease states.","PeriodicalId":37100,"journal":{"name":"Informatics","volume":"10 1","pages":"33"},"PeriodicalIF":2.8000,"publicationDate":"2023-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/informatics10020033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

(1) One in four hospital readmissions is potentially preventable. Machine learning (ML) models have been developed to predict hospital readmissions and risk-stratify patients, but thus far they have been limited in clinical applicability, timeliness, and generalizability. (2) Methods: Using deidentified clinical data from the University of California, San Francisco (UCSF) between January 2016 and November 2021, we developed and compared four supervised ML models (logistic regression, random forest, gradient boosting, and XGBoost) to predict 30-day readmissions for adults admitted to a UCSF hospital. (3) Results: Of 147,358 inpatient encounters, 20,747 (13.9%) patients were readmitted within 30 days of discharge. The final model selected was XGBoost, which had an area under the receiver operating characteristic curve of 0.783 and an area under the precision-recall curve of 0.434. The most important features by Shapley Additive Explanations were days since last admission, discharge department, and inpatient length of stay. (4) Conclusions: We developed and internally validated a supervised ML model to predict 30-day readmissions in a US-based healthcare system. This model has several advantages including state-of-the-art performance metrics, the use of clinical data, the use of features available within 24 h of discharge, and generalizability to multiple disease states.

查看原文本刊更多论文

一个可解释的机器学习模型的开发和内部验证，以预测美国医疗保健系统的再入院率

（1）四分之一的再次入院可能是可以预防的。机器学习（ML）模型已被开发用于预测医院再次入院和对患者进行风险分层，但到目前为止，它们在临床适用性、及时性和可推广性方面受到限制。（2）方法：利用加州大学旧金山分校（UCSF）2016年1月至2021年11月的非识别临床数据，我们开发并比较了四种监督ML模型（逻辑回归、随机森林、梯度增强和XGBoost），以预测加州大学旧金山分校医院收治的成年人30天的再入院情况。（3）结果：在147358例住院患者中，20747例（13.9%）患者在出院后30天内再次入院。最终选择的型号是XGBoost，其受试者工作特性曲线下的面积为0.783，精密召回曲线下的区域为0.434。Shapley加法解释最重要的特征是自上次入院以来的天数、出院部门和住院时间。（4）结论：我们开发并内部验证了一个监督ML模型，用于预测美国医疗系统中30天的再次入院。该模型具有几个优点，包括最先进的性能指标、临床数据的使用、出院24小时内可用特征的使用，以及对多种疾病状态的可推广性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊