Diego Halac, Cecilia Cocucci, Sebastian Camerlingo
{"title":"三级医疗机构30天再入院的预测机器学习模型","authors":"Diego Halac, Cecilia Cocucci, Sebastian Camerlingo","doi":"10.1093/bioadv/vbaf121","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Hospital readmissions represent a major challenge for healthcare systems due to their impact on patient outcomes and associated costs. As many readmissions are considered preventable, predictive modeling offers a valuable tool for early identification and intervention. This study aimed to develop and validate a predictive model for 30-day readmissions in a 200-bed community hospital in Argentina. A retrospective analysis was conducted on 3388 adult admissions. The primary endpoint was readmission within 30 days of discharge. Predictor variables included demographic and clinical factors such as age, length of stay, hypertension, diabetes, heart failure, coronary artery disease, stroke, cancer, dementia, chronic kidney disease, chronic obstructive pulmonary disease, and bedridden status. Three models-Logistic Regression (LR), Random Forest (RF), and LightGBM (LGBM)-were developed, with hyperparameter tuning via Bayesian optimization. Model performance was assessed using calibration, discrimination (C-statistics), and decision curve analysis. Internal validation was performed using 250 bootstrap resamples.</p><p><strong>Results: </strong>The readmission rate was 11% (<i>n</i> = 394). RF outperformed LR and LGBM in discrimination and clinical utility within predictive probability thresholds of 0.05-0.25. Optimism-corrected C-statistics were 0.60 (LR, LGBM) and 0.64 (RF); calibration slopes were 0.75 (LR), 1.13 (RF), and 1.76 (LGBM). Machine learning models, particularly RF, can improve readmission risk prediction and inform targeted healthcare interventions.</p><p><strong>Availability and implementation: </strong>The dataset and code used to develop and validate the predictive models are available from the corresponding author upon reasonable request. The implementation was conducted in R using the mlr3verse, pminternal, rms, dcurves, data.table, tidyverse, ranger and lightgbm packages, with Bayesian hyperparameter optimization via mlr3mbo.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf121"},"PeriodicalIF":2.8000,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12158157/pdf/","citationCount":"0","resultStr":"{\"title\":\"Predictive machine learning model for 30-day hospital readmissions in a tertiary healthcare setting.\",\"authors\":\"Diego Halac, Cecilia Cocucci, Sebastian Camerlingo\",\"doi\":\"10.1093/bioadv/vbaf121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Hospital readmissions represent a major challenge for healthcare systems due to their impact on patient outcomes and associated costs. As many readmissions are considered preventable, predictive modeling offers a valuable tool for early identification and intervention. This study aimed to develop and validate a predictive model for 30-day readmissions in a 200-bed community hospital in Argentina. A retrospective analysis was conducted on 3388 adult admissions. The primary endpoint was readmission within 30 days of discharge. Predictor variables included demographic and clinical factors such as age, length of stay, hypertension, diabetes, heart failure, coronary artery disease, stroke, cancer, dementia, chronic kidney disease, chronic obstructive pulmonary disease, and bedridden status. Three models-Logistic Regression (LR), Random Forest (RF), and LightGBM (LGBM)-were developed, with hyperparameter tuning via Bayesian optimization. Model performance was assessed using calibration, discrimination (C-statistics), and decision curve analysis. Internal validation was performed using 250 bootstrap resamples.</p><p><strong>Results: </strong>The readmission rate was 11% (<i>n</i> = 394). RF outperformed LR and LGBM in discrimination and clinical utility within predictive probability thresholds of 0.05-0.25. Optimism-corrected C-statistics were 0.60 (LR, LGBM) and 0.64 (RF); calibration slopes were 0.75 (LR), 1.13 (RF), and 1.76 (LGBM). Machine learning models, particularly RF, can improve readmission risk prediction and inform targeted healthcare interventions.</p><p><strong>Availability and implementation: </strong>The dataset and code used to develop and validate the predictive models are available from the corresponding author upon reasonable request. The implementation was conducted in R using the mlr3verse, pminternal, rms, dcurves, data.table, tidyverse, ranger and lightgbm packages, with Bayesian hyperparameter optimization via mlr3mbo.</p>\",\"PeriodicalId\":72368,\"journal\":{\"name\":\"Bioinformatics advances\",\"volume\":\"5 1\",\"pages\":\"vbaf121\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12158157/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioadv/vbaf121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
Predictive machine learning model for 30-day hospital readmissions in a tertiary healthcare setting.
Motivation: Hospital readmissions represent a major challenge for healthcare systems due to their impact on patient outcomes and associated costs. As many readmissions are considered preventable, predictive modeling offers a valuable tool for early identification and intervention. This study aimed to develop and validate a predictive model for 30-day readmissions in a 200-bed community hospital in Argentina. A retrospective analysis was conducted on 3388 adult admissions. The primary endpoint was readmission within 30 days of discharge. Predictor variables included demographic and clinical factors such as age, length of stay, hypertension, diabetes, heart failure, coronary artery disease, stroke, cancer, dementia, chronic kidney disease, chronic obstructive pulmonary disease, and bedridden status. Three models-Logistic Regression (LR), Random Forest (RF), and LightGBM (LGBM)-were developed, with hyperparameter tuning via Bayesian optimization. Model performance was assessed using calibration, discrimination (C-statistics), and decision curve analysis. Internal validation was performed using 250 bootstrap resamples.
Results: The readmission rate was 11% (n = 394). RF outperformed LR and LGBM in discrimination and clinical utility within predictive probability thresholds of 0.05-0.25. Optimism-corrected C-statistics were 0.60 (LR, LGBM) and 0.64 (RF); calibration slopes were 0.75 (LR), 1.13 (RF), and 1.76 (LGBM). Machine learning models, particularly RF, can improve readmission risk prediction and inform targeted healthcare interventions.
Availability and implementation: The dataset and code used to develop and validate the predictive models are available from the corresponding author upon reasonable request. The implementation was conducted in R using the mlr3verse, pminternal, rms, dcurves, data.table, tidyverse, ranger and lightgbm packages, with Bayesian hyperparameter optimization via mlr3mbo.