一项可解释的机器学习研究,用于开发用于预测熟练护理机构再住院的二元分类器

Zhouyang Lou , Zachary Hass , Nan Kong
{"title":"一项可解释的机器学习研究,用于开发用于预测熟练护理机构再住院的二元分类器","authors":"Zhouyang Lou ,&nbsp;Zachary Hass ,&nbsp;Nan Kong","doi":"10.1016/j.health.2025.100387","DOIUrl":null,"url":null,"abstract":"<div><div>Reducing hospital readmissions for older adults discharged to a skilled nursing facility (SNF) is important to the Unites States (U.S.) both from financial and care quality perspectives. To identify potential risk factors, researchers have used data from claims, national surveys, and administrative databases to train models that predict hospital readmissions that occur within 30 days of discharge. Machine learning techniques hold promise for this binary classification task. However, analysis pipelines are underdeveloped in data balancing, feature selection, and model interpretability. In this paper, we utilized individual resident-level data from the Long-Term Care Minimum Data Set (MDS) collected from SNFs in a midwestern U.S. state (n = 93,058). We further triangulated this data with publicly available facility quality and staffing data from the Nursing Home Compares tool of the Medicare.gov and facility neighborhood data from the National Neighborhood Data Archive. We compared several machine learning models, data balancing techniques, and feature selection methods, for the prediction task. We found that XGBoost, with Synthetic Minority Oversampling Edited Nearest Neighbor (SMOTE-ENN) to balance the data, and hierarchical clustering based on spearman correlation to select the features that produces the best prediction performance. We then used SHapley Additive exPlanations (SHAP) values to identify features that contribute most to the performance and used partial dependence plots to examine curvilinear and moderating relationships between features and the risk of 30-day rehospitalization.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100387"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An interpretable machine learning study for developing a binary classifier for predicting rehospitalization from skilled nursing facilities\",\"authors\":\"Zhouyang Lou ,&nbsp;Zachary Hass ,&nbsp;Nan Kong\",\"doi\":\"10.1016/j.health.2025.100387\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reducing hospital readmissions for older adults discharged to a skilled nursing facility (SNF) is important to the Unites States (U.S.) both from financial and care quality perspectives. To identify potential risk factors, researchers have used data from claims, national surveys, and administrative databases to train models that predict hospital readmissions that occur within 30 days of discharge. Machine learning techniques hold promise for this binary classification task. However, analysis pipelines are underdeveloped in data balancing, feature selection, and model interpretability. In this paper, we utilized individual resident-level data from the Long-Term Care Minimum Data Set (MDS) collected from SNFs in a midwestern U.S. state (n = 93,058). We further triangulated this data with publicly available facility quality and staffing data from the Nursing Home Compares tool of the Medicare.gov and facility neighborhood data from the National Neighborhood Data Archive. We compared several machine learning models, data balancing techniques, and feature selection methods, for the prediction task. We found that XGBoost, with Synthetic Minority Oversampling Edited Nearest Neighbor (SMOTE-ENN) to balance the data, and hierarchical clustering based on spearman correlation to select the features that produces the best prediction performance. We then used SHapley Additive exPlanations (SHAP) values to identify features that contribute most to the performance and used partial dependence plots to examine curvilinear and moderating relationships between features and the risk of 30-day rehospitalization.</div></div>\",\"PeriodicalId\":73222,\"journal\":{\"name\":\"Healthcare analytics (New York, N.Y.)\",\"volume\":\"7 \",\"pages\":\"Article 100387\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Healthcare analytics (New York, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772442525000061\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442525000061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从财务和护理质量的角度来看,减少老年人出院到专业护理机构(SNF)的再入院率对美国(U.S.)都很重要。为了确定潜在的风险因素,研究人员使用来自索赔、国家调查和行政数据库的数据来训练模型,预测出院后30天内再次住院的情况。机器学习技术为这种二元分类任务带来了希望。然而,分析管道在数据平衡、特征选择和模型可解释性方面还不发达。在本文中,我们利用了从美国中西部一个州的snf收集的长期护理最低数据集(MDS)中的个体居民水平数据(n = 93,058)。我们进一步将这些数据与Medicare.gov的养老院比较工具中公开的设施质量和人员配备数据以及国家社区数据档案中的设施社区数据进行三角分析。为了预测任务,我们比较了几种机器学习模型、数据平衡技术和特征选择方法。我们发现XGBoost使用合成少数派过采样编辑最近邻(SMOTE-ENN)来平衡数据,并基于spearman相关的分层聚类来选择产生最佳预测性能的特征。然后,我们使用SHapley加性解释(SHAP)值来识别对表现贡献最大的特征,并使用部分依赖图来检验特征与30天再住院风险之间的曲线关系和调节关系。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An interpretable machine learning study for developing a binary classifier for predicting rehospitalization from skilled nursing facilities
Reducing hospital readmissions for older adults discharged to a skilled nursing facility (SNF) is important to the Unites States (U.S.) both from financial and care quality perspectives. To identify potential risk factors, researchers have used data from claims, national surveys, and administrative databases to train models that predict hospital readmissions that occur within 30 days of discharge. Machine learning techniques hold promise for this binary classification task. However, analysis pipelines are underdeveloped in data balancing, feature selection, and model interpretability. In this paper, we utilized individual resident-level data from the Long-Term Care Minimum Data Set (MDS) collected from SNFs in a midwestern U.S. state (n = 93,058). We further triangulated this data with publicly available facility quality and staffing data from the Nursing Home Compares tool of the Medicare.gov and facility neighborhood data from the National Neighborhood Data Archive. We compared several machine learning models, data balancing techniques, and feature selection methods, for the prediction task. We found that XGBoost, with Synthetic Minority Oversampling Edited Nearest Neighbor (SMOTE-ENN) to balance the data, and hierarchical clustering based on spearman correlation to select the features that produces the best prediction performance. We then used SHapley Additive exPlanations (SHAP) values to identify features that contribute most to the performance and used partial dependence plots to examine curvilinear and moderating relationships between features and the risk of 30-day rehospitalization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Healthcare analytics (New York, N.Y.)
Healthcare analytics (New York, N.Y.) Applied Mathematics, Modelling and Simulation, Nursing and Health Professions (General)
CiteScore
4.40
自引率
0.00%
发文量
0
审稿时长
79 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信