Explainable machine learning model incorporating social determinants of health to predict chronic kidney disease in type 2 diabetes patients.

IF 1.6 Q4 ENDOCRINOLOGY & METABOLISM
Journal of Diabetes and Metabolic Disorders Pub Date : 2025-05-09 eCollection Date: 2025-06-01 DOI:10.1007/s40200-025-01621-9
Md Mohaimenul Islam, Tahmina Nasrin Poly, Arinzechukwu Nkemdirim Okere, Yao-Chin Wang
{"title":"Explainable machine learning model incorporating social determinants of health to predict chronic kidney disease in type 2 diabetes patients.","authors":"Md Mohaimenul Islam, Tahmina Nasrin Poly, Arinzechukwu Nkemdirim Okere, Yao-Chin Wang","doi":"10.1007/s40200-025-01621-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Social determinants of health (SDOH) play a critical role in the onset and progression of chronic kidney disease (CKD). Despite the well-established role of SDOH, previous studies have not fully incorporated these factors in predicting CKD in Type 2 diabetes patients. To bridge this gap, this study aimed to develop and evaluate the machine learning (ML) models that incorporate SDOH to enhance CKD risk prediction in Type 2 diabetes patients.</p><p><strong>Methods: </strong>Data were obtained from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), a national survey that collects comprehensive health-related data from adults across the United States. Missing data were addressed using the K-nearest neighbor imputation method, and the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance class distributions. Potential predictive features were selected using correlation coefficient analysis. The dataset was partitioned into training (80%) and testing (20%) subsets, with a 3-fold cross-validation strategy applied to the training data. Seven ML models were developed for CKD risk prediction, including logistic regression (LR), decision tree (DT), K-nearest neighbor (KNN), random forest (RF), eXtreme Gradient Boosting (XGBoost), and an artificial neural network (ANN). Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), precision, recall, F1 score, accuracy, and false positive rate.</p><p><strong>Results: </strong>The study included 19,912 Type 2 diabetes patients (weighted sample size: 818,878), among whom 2,924 (weighted 13.92%) had CKD, and 16,988 (weighted 86.08%) did not. Over half of the CKD group (50.4%) were aged 65 or older. The proportion of female patients was higher in both groups, comprising 53.8% of the CKD group and 50.5% of the non-CKD group. Among the ML models evaluated, the RF model demonstrated the highest predictive performance for CKD, with an AUROC of 0.89 (95% CI: 0.88 - 0.90), followed by the DT model (0.84, 95% CI: 0.83 - 0.85) and XGBoost (0.83, 95% CI: 0.82 - 0.84). The RF model achieved an accuracy of 0.81 (95%CI: 0.81 - 0.81), a precision of 0.79 (95%CI: 0.79 - 0.79), a recall of 0.85 (95%CI: 0.85 - 0.85), and an F1 score of 0.82 (95%CI: 0.82 - 0.82). Additionally, the RF model exhibited strong calibration, reinforcing its reliability as a predictive tool for CKD risk in individuals with Type 2 diabetes.</p><p><strong>Conclusion: </strong>The study findings underscore the potential of ML models, particularly the RF model, in accurately predicting CKD among individuals with Type 2 diabetes. This approach not only enhances the precision of CKD prediction but also highlights the importance of addressing social and environmental disparities in disease prevention and management. Leveraging ML models with SDOH can lead to earlier interventions, more personalized treatment plans, and improved health outcomes for vulnerable populations.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s40200-025-01621-9.</p>","PeriodicalId":15635,"journal":{"name":"Journal of Diabetes and Metabolic Disorders","volume":"24 1","pages":"115"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12064531/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes and Metabolic Disorders","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40200-025-01621-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background and objectives: Social determinants of health (SDOH) play a critical role in the onset and progression of chronic kidney disease (CKD). Despite the well-established role of SDOH, previous studies have not fully incorporated these factors in predicting CKD in Type 2 diabetes patients. To bridge this gap, this study aimed to develop and evaluate the machine learning (ML) models that incorporate SDOH to enhance CKD risk prediction in Type 2 diabetes patients.

Methods: Data were obtained from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), a national survey that collects comprehensive health-related data from adults across the United States. Missing data were addressed using the K-nearest neighbor imputation method, and the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance class distributions. Potential predictive features were selected using correlation coefficient analysis. The dataset was partitioned into training (80%) and testing (20%) subsets, with a 3-fold cross-validation strategy applied to the training data. Seven ML models were developed for CKD risk prediction, including logistic regression (LR), decision tree (DT), K-nearest neighbor (KNN), random forest (RF), eXtreme Gradient Boosting (XGBoost), and an artificial neural network (ANN). Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), precision, recall, F1 score, accuracy, and false positive rate.

Results: The study included 19,912 Type 2 diabetes patients (weighted sample size: 818,878), among whom 2,924 (weighted 13.92%) had CKD, and 16,988 (weighted 86.08%) did not. Over half of the CKD group (50.4%) were aged 65 or older. The proportion of female patients was higher in both groups, comprising 53.8% of the CKD group and 50.5% of the non-CKD group. Among the ML models evaluated, the RF model demonstrated the highest predictive performance for CKD, with an AUROC of 0.89 (95% CI: 0.88 - 0.90), followed by the DT model (0.84, 95% CI: 0.83 - 0.85) and XGBoost (0.83, 95% CI: 0.82 - 0.84). The RF model achieved an accuracy of 0.81 (95%CI: 0.81 - 0.81), a precision of 0.79 (95%CI: 0.79 - 0.79), a recall of 0.85 (95%CI: 0.85 - 0.85), and an F1 score of 0.82 (95%CI: 0.82 - 0.82). Additionally, the RF model exhibited strong calibration, reinforcing its reliability as a predictive tool for CKD risk in individuals with Type 2 diabetes.

Conclusion: The study findings underscore the potential of ML models, particularly the RF model, in accurately predicting CKD among individuals with Type 2 diabetes. This approach not only enhances the precision of CKD prediction but also highlights the importance of addressing social and environmental disparities in disease prevention and management. Leveraging ML models with SDOH can lead to earlier interventions, more personalized treatment plans, and improved health outcomes for vulnerable populations.

Supplementary information: The online version contains supplementary material available at 10.1007/s40200-025-01621-9.

可解释的机器学习模型结合健康的社会决定因素来预测2型糖尿病患者的慢性肾脏疾病。
背景和目的:健康的社会决定因素(SDOH)在慢性肾脏疾病(CKD)的发生和进展中起着关键作用。尽管SDOH在预测2型糖尿病患者CKD方面的作用已经确立,但之前的研究并没有完全纳入这些因素。为了弥补这一差距,本研究旨在开发和评估纳入SDOH的机器学习(ML)模型,以增强2型糖尿病患者CKD风险预测。方法:数据来自2023年行为风险因素监测系统(BRFSS),这是一项全国性调查,收集了美国各地成年人的全面健康相关数据。采用k近邻插值法对缺失数据进行处理,并采用合成少数过采样技术(SMOTE)来平衡类分布。利用相关系数分析选择潜在的预测特征。将数据集划分为训练子集(80%)和测试子集(20%),对训练数据采用3倍交叉验证策略。建立了7个用于CKD风险预测的ML模型,包括逻辑回归(LR)、决策树(DT)、k近邻(KNN)、随机森林(RF)、极端梯度增强(XGBoost)和人工神经网络(ANN)。使用多个指标评估模型的性能,包括接收者工作特征曲线下面积(AUROC)、精度、召回率、F1评分、准确性和假阳性率。结果:该研究纳入了19,912例2型糖尿病患者(加权样本量:818,878例),其中2,924例(加权13.92%)患有CKD, 16,988例(加权86.08%)未患CKD。超过一半的CKD组(50.4%)年龄在65岁或以上。两组女性患者比例均较高,分别占CKD组的53.8%和非CKD组的50.5%。在评估的ML模型中,RF模型对CKD的预测性能最高,AUROC为0.89 (95% CI: 0.88 - 0.90),其次是DT模型(0.84,95% CI: 0.83 - 0.85)和XGBoost模型(0.83,95% CI: 0.82 - 0.84)。RF模型的准确度为0.81 (95%CI: 0.81 ~ 0.81),精密度为0.79 (95%CI: 0.79 ~ 0.79),召回率为0.85 (95%CI: 0.85 ~ 0.85), F1评分为0.82 (95%CI: 0.82 ~ 0.82)。此外,RF模型具有很强的可校准性,增强了其作为2型糖尿病患者CKD风险预测工具的可靠性。结论:研究结果强调了ML模型,特别是RF模型在准确预测2型糖尿病患者CKD方面的潜力。这种方法不仅提高了CKD预测的准确性,而且强调了在疾病预防和管理中解决社会和环境差异的重要性。利用ML模型与SDOH可以实现更早的干预,更个性化的治疗计划,并改善弱势群体的健康结果。补充资料:在线版本提供补充资料,网址为10.1007/s40200-025-01621-9。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Diabetes and Metabolic Disorders
Journal of Diabetes and Metabolic Disorders Medicine-Internal Medicine
CiteScore
4.80
自引率
3.60%
发文量
210
期刊介绍: Journal of Diabetes & Metabolic Disorders is a peer reviewed journal which publishes original clinical and translational articles and reviews in the field of endocrinology and provides a forum of debate of the highest quality on these issues. Topics of interest include, but are not limited to, diabetes, lipid disorders, metabolic disorders, osteoporosis, interdisciplinary practices in endocrinology, cardiovascular and metabolic risk, aging research, obesity, traditional medicine, pychosomatic research, behavioral medicine, ethics and evidence-based practices.As of Jan 2018 the journal is published by Springer as a hybrid journal with no article processing charges. All articles published before 2018 are available free of charge on springerlink.Unofficial 2017 2-year Impact Factor: 1.816.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信