{"title":"Leveraging a machine learning model to predict hospital readmission risk: integrating clinical and social determinants of health data.","authors":"Tianyu Zhang","doi":"10.3389/fpubh.2026.1754585","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Hospital readmissions remain a major challenge for healthcare systems, contributing to higher costs and worse patient outcomes. Although most prediction models rely primarily on clinical data, integrating social determinants of health (SDOH) may improve risk assessment. However, the use of machine learning (ML) to combine clinical and SDOH data for readmission prediction remains limited.</p><p><strong>Objective: </strong>To develop and compare machine learning models for predicting 30-day hospital readmission by integrating clinical and SDOH data.</p><p><strong>Methods: </strong>We conducted a retrospective cohort study of 3,018 adult patients discharged from a large academic medical center between January 2022 and December 2023. Clinical variables were extracted from electronic health records and linked, through geocoded residential addresses, to area-level SDOH indicators from publicly available census data, including neighborhood deprivation, median income, and educational attainment. Six tabular ML models were trained and evaluated, including Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, and Support Vector Machine. Model performance was assessed using the area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC (PR-AUC), and F1-score. SHapley Additive exPlanations (SHAP) were used to assess feature importance.</p><p><strong>Results: </strong>Ensemble models outperformed Logistic Regression, with XGBoost achieving the best performance on the test set (ROC-AUC 0.79, 95% CI 0.75-0.82; PR-AUC 0.71). In addition to key clinical variables such as prior admissions and comorbidity burden, SDOH features including neighborhood socioeconomic status and household composition were among the most important predictors.</p><p><strong>Conclusion: </strong>Integrating clinical and SDOH data into ML models improved prediction of 30-day hospital readmission. These findings support moving beyond clinical-only models and suggest that SDOH-informed prediction may help identify high-risk patients earlier and guide more targeted care management.</p>","PeriodicalId":12548,"journal":{"name":"Frontiers in Public Health","volume":"14 ","pages":"1754585"},"PeriodicalIF":3.4000,"publicationDate":"2026-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13143950/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Public Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fpubh.2026.1754585","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Hospital readmissions remain a major challenge for healthcare systems, contributing to higher costs and worse patient outcomes. Although most prediction models rely primarily on clinical data, integrating social determinants of health (SDOH) may improve risk assessment. However, the use of machine learning (ML) to combine clinical and SDOH data for readmission prediction remains limited.
Objective: To develop and compare machine learning models for predicting 30-day hospital readmission by integrating clinical and SDOH data.
Methods: We conducted a retrospective cohort study of 3,018 adult patients discharged from a large academic medical center between January 2022 and December 2023. Clinical variables were extracted from electronic health records and linked, through geocoded residential addresses, to area-level SDOH indicators from publicly available census data, including neighborhood deprivation, median income, and educational attainment. Six tabular ML models were trained and evaluated, including Logistic Regression, Random Forest, XGBoost, LightGBM, CatBoost, and Support Vector Machine. Model performance was assessed using the area under the receiver operating characteristic curve (ROC-AUC), precision-recall AUC (PR-AUC), and F1-score. SHapley Additive exPlanations (SHAP) were used to assess feature importance.
Results: Ensemble models outperformed Logistic Regression, with XGBoost achieving the best performance on the test set (ROC-AUC 0.79, 95% CI 0.75-0.82; PR-AUC 0.71). In addition to key clinical variables such as prior admissions and comorbidity burden, SDOH features including neighborhood socioeconomic status and household composition were among the most important predictors.
Conclusion: Integrating clinical and SDOH data into ML models improved prediction of 30-day hospital readmission. These findings support moving beyond clinical-only models and suggest that SDOH-informed prediction may help identify high-risk patients earlier and guide more targeted care management.
期刊介绍:
Frontiers in Public Health is a multidisciplinary open-access journal which publishes rigorously peer-reviewed research and is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics, clinicians, policy makers and the public worldwide. The journal aims at overcoming current fragmentation in research and publication, promoting consistency in pursuing relevant scientific themes, and supporting finding dissemination and translation into practice.
Frontiers in Public Health is organized into Specialty Sections that cover different areas of research in the field. Please refer to the author guidelines for details on article types and the submission process.