Leveraging a machine learning model to predict hospital readmission risk: integrating clinical and social determinants of health data.

IF 3.4 3区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Frontiers in Public Health Pub Date : 2026-04-22 eCollection Date: 2026-01-01 DOI:10.3389/fpubh.2026.1754585
Tianyu Zhang
{"title":"Leveraging a machine learning model to predict hospital readmission risk: integrating clinical and social determinants of health data.","authors":"Tianyu Zhang","doi":"10.3389/fpubh.2026.1754585","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hospital readmissions remain a major challenge for healthcare systems, contributing to higher costs and worse patient outcomes. Although most prediction models rely primarily on clinical data, integrating social determinants of health (SDOH) may improve risk assessment. However, the use of machine learning (ML) to combine clinical and SDOH data for readmission prediction remains limited.</p><p><strong>Objective: </strong>To develop and compare machine learning models for predicting 30-day hospital readmission by integrating clinical and SDOH data.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study of 3,018 adult patients discharged from a large academic medical center between January 2022 and December 2023. Clinical variables were extracted from electronic health records and linked, through geocoded residential addresses, to area-level SDOH indicators from publicly available census data, including neighborhood deprivation, median income, and educational attainment. Six tabular ML models were trained and evaluated, including Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, and Support Vector Machine. Model performance was assessed using the area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC (PR-AUC), and F1-score. SHapley Additive exPlanations (SHAP) were used to assess feature importance.</p><p><strong>Results: </strong>Ensemble models outperformed Logistic Regression, with XGBoost achieving the best performance on the test set (ROC-AUC 0.79, 95% CI 0.75-0.82; PR-AUC 0.71). In addition to key clinical variables such as prior admissions and comorbidity burden, SDOH features including neighborhood socioeconomic status and household composition were among the most important predictors.</p><p><strong>Conclusion: </strong>Integrating clinical and SDOH data into ML models improved prediction of 30-day hospital readmission. These findings support moving beyond clinical-only models and suggest that SDOH-informed prediction may help identify high-risk patients earlier and guide more targeted care management.</p>","PeriodicalId":12548,"journal":{"name":"Frontiers in Public Health","volume":"14 ","pages":"1754585"},"PeriodicalIF":3.4000,"publicationDate":"2026-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13143950/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Public Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fpubh.2026.1754585","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Hospital readmissions remain a major challenge for healthcare systems, contributing to higher costs and worse patient outcomes. Although most prediction models rely primarily on clinical data, integrating social determinants of health (SDOH) may improve risk assessment. However, the use of machine learning (ML) to combine clinical and SDOH data for readmission prediction remains limited.

Objective: To develop and compare machine learning models for predicting 30-day hospital readmission by integrating clinical and SDOH data.

Methods: We conducted a retrospective cohort study of 3,018 adult patients discharged from a large academic medical center between January 2022 and December 2023. Clinical variables were extracted from electronic health records and linked, through geocoded residential addresses, to area-level SDOH indicators from publicly available census data, including neighborhood deprivation, median income, and educational attainment. Six tabular ML models were trained and evaluated, including Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, and Support Vector Machine. Model performance was assessed using the area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC (PR-AUC), and F1-score. SHapley Additive exPlanations (SHAP) were used to assess feature importance.

Results: Ensemble models outperformed Logistic Regression, with XGBoost achieving the best performance on the test set (ROC-AUC 0.79, 95% CI 0.75-0.82; PR-AUC 0.71). In addition to key clinical variables such as prior admissions and comorbidity burden, SDOH features including neighborhood socioeconomic status and household composition were among the most important predictors.

Conclusion: Integrating clinical and SDOH data into ML models improved prediction of 30-day hospital readmission. These findings support moving beyond clinical-only models and suggest that SDOH-informed prediction may help identify high-risk patients earlier and guide more targeted care management.

利用机器学习模型预测医院再入院风险:整合健康数据的临床和社会决定因素。
背景:医院再入院仍然是医疗保健系统面临的主要挑战,导致成本上升和患者预后恶化。虽然大多数预测模型主要依赖于临床数据,但整合健康的社会决定因素(SDOH)可能会改善风险评估。然而,使用机器学习(ML)结合临床和SDOH数据进行再入院预测仍然有限。目的:通过整合临床和SDOH数据,建立并比较用于预测30天再入院的机器学习模型。方法:我们对2022年1月至2023年12月从一家大型学术医疗中心出院的3018名成年患者进行了回顾性队列研究。从电子健康记录中提取临床变量,并通过地理编码的居住地址,将其与公开可获得的人口普查数据中的区域级SDOH指标联系起来,包括邻里剥夺、收入中位数和受教育程度。训练和评估了6个表式ML模型,包括Logistic回归、随机森林、XGBoost、LightGBM、CatBoost和支持向量机。使用受试者工作特征曲线下面积(ROC-AUC)、精确召回率AUC (PR-AUC)和f1评分来评估模型的性能。采用SHapley加性解释(SHAP)评价特征重要性。结果:集成模型优于Logistic回归,其中XGBoost在测试集上获得最佳性能(ROC-AUC 0.79, 95% CI 0.75-0.82; PR-AUC 0.71)。除了关键的临床变量,如既往入院和合并症负担,SDOH特征,包括社区社会经济地位和家庭组成是最重要的预测因素。结论:将临床和SDOH数据整合到ML模型中可以提高对30天再入院的预测。这些发现支持超越临床模型,并提示基于sdoh的预测可能有助于更早地识别高风险患者,并指导更有针对性的护理管理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Frontiers in Public Health
Frontiers in Public Health Medicine-Public Health, Environmental and Occupational Health
CiteScore
4.80
自引率
7.70%
发文量
4469
审稿时长
14 weeks
期刊介绍: Frontiers in Public Health is a multidisciplinary open-access journal which publishes rigorously peer-reviewed research and is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians, policy makers and the public worldwide. The journal aims at overcoming current fragmentation in research and publication, promoting consistency in pursuing relevant scientific themes, and supporting finding dissemination and translation into practice. Frontiers in Public Health is organized into Specialty Sections that cover different areas of research in the field. Please refer to the author guidelines for details on article types and the submission process.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书