{"title":"可解释的机器学习模型结合健康的社会决定因素来预测2型糖尿病患者的慢性肾脏疾病。","authors":"Md Mohaimenul Islam, Tahmina Nasrin Poly, Arinzechukwu Nkemdirim Okere, Yao-Chin Wang","doi":"10.1007/s40200-025-01621-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objectives: </strong>Social determinants of health (SDOH) play a critical role in the onset and progression of chronic kidney disease (CKD). Despite the well-established role of SDOH, previous studies have not fully incorporated these factors in predicting CKD in Type 2 diabetes patients. To bridge this gap, this study aimed to develop and evaluate the machine learning (ML) models that incorporate SDOH to enhance CKD risk prediction in Type 2 diabetes patients.</p><p><strong>Methods: </strong>Data were obtained from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), a national survey that collects comprehensive health-related data from adults across the United States. Missing data were addressed using the K-nearest neighbor imputation method, and the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance class distributions. Potential predictive features were selected using correlation coefficient analysis. The dataset was partitioned into training (80%) and testing (20%) subsets, with a 3-fold cross-validation strategy applied to the training data. Seven ML models were developed for CKD risk prediction, including logistic regression (LR), decision tree (DT), K-nearest neighbor (KNN), random forest (RF), eXtreme Gradient Boosting (XGBoost), and an artificial neural network (ANN). Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), precision, recall, F1 score, accuracy, and false positive rate.</p><p><strong>Results: </strong>The study included 19,912 Type 2 diabetes patients (weighted sample size: 818,878), among whom 2,924 (weighted 13.92%) had CKD, and 16,988 (weighted 86.08%) did not. Over half of the CKD group (50.4%) were aged 65 or older. The proportion of female patients was higher in both groups, comprising 53.8% of the CKD group and 50.5% of the non-CKD group. Among the ML models evaluated, the RF model demonstrated the highest predictive performance for CKD, with an AUROC of 0.89 (95% CI: 0.88 - 0.90), followed by the DT model (0.84, 95% CI: 0.83 - 0.85) and XGBoost (0.83, 95% CI: 0.82 - 0.84). The RF model achieved an accuracy of 0.81 (95%CI: 0.81 - 0.81), a precision of 0.79 (95%CI: 0.79 - 0.79), a recall of 0.85 (95%CI: 0.85 - 0.85), and an F1 score of 0.82 (95%CI: 0.82 - 0.82). Additionally, the RF model exhibited strong calibration, reinforcing its reliability as a predictive tool for CKD risk in individuals with Type 2 diabetes.</p><p><strong>Conclusion: </strong>The study findings underscore the potential of ML models, particularly the RF model, in accurately predicting CKD among individuals with Type 2 diabetes. This approach not only enhances the precision of CKD prediction but also highlights the importance of addressing social and environmental disparities in disease prevention and management. Leveraging ML models with SDOH can lead to earlier interventions, more personalized treatment plans, and improved health outcomes for vulnerable populations.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s40200-025-01621-9.</p>","PeriodicalId":15635,"journal":{"name":"Journal of Diabetes and Metabolic Disorders","volume":"24 1","pages":"115"},"PeriodicalIF":1.6000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12064531/pdf/","citationCount":"0","resultStr":"{\"title\":\"Explainable machine learning model incorporating social determinants of health to predict chronic kidney disease in type 2 diabetes patients.\",\"authors\":\"Md Mohaimenul Islam, Tahmina Nasrin Poly, Arinzechukwu Nkemdirim Okere, Yao-Chin Wang\",\"doi\":\"10.1007/s40200-025-01621-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objectives: </strong>Social determinants of health (SDOH) play a critical role in the onset and progression of chronic kidney disease (CKD). Despite the well-established role of SDOH, previous studies have not fully incorporated these factors in predicting CKD in Type 2 diabetes patients. To bridge this gap, this study aimed to develop and evaluate the machine learning (ML) models that incorporate SDOH to enhance CKD risk prediction in Type 2 diabetes patients.</p><p><strong>Methods: </strong>Data were obtained from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), a national survey that collects comprehensive health-related data from adults across the United States. Missing data were addressed using the K-nearest neighbor imputation method, and the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance class distributions. Potential predictive features were selected using correlation coefficient analysis. The dataset was partitioned into training (80%) and testing (20%) subsets, with a 3-fold cross-validation strategy applied to the training data. Seven ML models were developed for CKD risk prediction, including logistic regression (LR), decision tree (DT), K-nearest neighbor (KNN), random forest (RF), eXtreme Gradient Boosting (XGBoost), and an artificial neural network (ANN). Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), precision, recall, F1 score, accuracy, and false positive rate.</p><p><strong>Results: </strong>The study included 19,912 Type 2 diabetes patients (weighted sample size: 818,878), among whom 2,924 (weighted 13.92%) had CKD, and 16,988 (weighted 86.08%) did not. Over half of the CKD group (50.4%) were aged 65 or older. The proportion of female patients was higher in both groups, comprising 53.8% of the CKD group and 50.5% of the non-CKD group. Among the ML models evaluated, the RF model demonstrated the highest predictive performance for CKD, with an AUROC of 0.89 (95% CI: 0.88 - 0.90), followed by the DT model (0.84, 95% CI: 0.83 - 0.85) and XGBoost (0.83, 95% CI: 0.82 - 0.84). The RF model achieved an accuracy of 0.81 (95%CI: 0.81 - 0.81), a precision of 0.79 (95%CI: 0.79 - 0.79), a recall of 0.85 (95%CI: 0.85 - 0.85), and an F1 score of 0.82 (95%CI: 0.82 - 0.82). Additionally, the RF model exhibited strong calibration, reinforcing its reliability as a predictive tool for CKD risk in individuals with Type 2 diabetes.</p><p><strong>Conclusion: </strong>The study findings underscore the potential of ML models, particularly the RF model, in accurately predicting CKD among individuals with Type 2 diabetes. This approach not only enhances the precision of CKD prediction but also highlights the importance of addressing social and environmental disparities in disease prevention and management. Leveraging ML models with SDOH can lead to earlier interventions, more personalized treatment plans, and improved health outcomes for vulnerable populations.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s40200-025-01621-9.</p>\",\"PeriodicalId\":15635,\"journal\":{\"name\":\"Journal of Diabetes and Metabolic Disorders\",\"volume\":\"24 1\",\"pages\":\"115\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12064531/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Diabetes and Metabolic Disorders\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s40200-025-01621-9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/6/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q4\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes and Metabolic Disorders","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40200-025-01621-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q4","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
Explainable machine learning model incorporating social determinants of health to predict chronic kidney disease in type 2 diabetes patients.
Background and objectives: Social determinants of health (SDOH) play a critical role in the onset and progression of chronic kidney disease (CKD). Despite the well-established role of SDOH, previous studies have not fully incorporated these factors in predicting CKD in Type 2 diabetes patients. To bridge this gap, this study aimed to develop and evaluate the machine learning (ML) models that incorporate SDOH to enhance CKD risk prediction in Type 2 diabetes patients.
Methods: Data were obtained from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), a national survey that collects comprehensive health-related data from adults across the United States. Missing data were addressed using the K-nearest neighbor imputation method, and the Synthetic Minority Oversampling Technique (SMOTE) was applied to balance class distributions. Potential predictive features were selected using correlation coefficient analysis. The dataset was partitioned into training (80%) and testing (20%) subsets, with a 3-fold cross-validation strategy applied to the training data. Seven ML models were developed for CKD risk prediction, including logistic regression (LR), decision tree (DT), K-nearest neighbor (KNN), random forest (RF), eXtreme Gradient Boosting (XGBoost), and an artificial neural network (ANN). Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUROC), precision, recall, F1 score, accuracy, and false positive rate.
Results: The study included 19,912 Type 2 diabetes patients (weighted sample size: 818,878), among whom 2,924 (weighted 13.92%) had CKD, and 16,988 (weighted 86.08%) did not. Over half of the CKD group (50.4%) were aged 65 or older. The proportion of female patients was higher in both groups, comprising 53.8% of the CKD group and 50.5% of the non-CKD group. Among the ML models evaluated, the RF model demonstrated the highest predictive performance for CKD, with an AUROC of 0.89 (95% CI: 0.88 - 0.90), followed by the DT model (0.84, 95% CI: 0.83 - 0.85) and XGBoost (0.83, 95% CI: 0.82 - 0.84). The RF model achieved an accuracy of 0.81 (95%CI: 0.81 - 0.81), a precision of 0.79 (95%CI: 0.79 - 0.79), a recall of 0.85 (95%CI: 0.85 - 0.85), and an F1 score of 0.82 (95%CI: 0.82 - 0.82). Additionally, the RF model exhibited strong calibration, reinforcing its reliability as a predictive tool for CKD risk in individuals with Type 2 diabetes.
Conclusion: The study findings underscore the potential of ML models, particularly the RF model, in accurately predicting CKD among individuals with Type 2 diabetes. This approach not only enhances the precision of CKD prediction but also highlights the importance of addressing social and environmental disparities in disease prevention and management. Leveraging ML models with SDOH can lead to earlier interventions, more personalized treatment plans, and improved health outcomes for vulnerable populations.
Supplementary information: The online version contains supplementary material available at 10.1007/s40200-025-01621-9.
期刊介绍:
Journal of Diabetes & Metabolic Disorders is a peer reviewed journal which publishes original clinical and translational articles and reviews in the field of endocrinology and provides a forum of debate of the highest quality on these issues. Topics of interest include, but are not limited to, diabetes, lipid disorders, metabolic disorders, osteoporosis, interdisciplinary practices in endocrinology, cardiovascular and metabolic risk, aging research, obesity, traditional medicine, pychosomatic research, behavioral medicine, ethics and evidence-based practices.As of Jan 2018 the journal is published by Springer as a hybrid journal with no article processing charges. All articles published before 2018 are available free of charge on springerlink.Unofficial 2017 2-year Impact Factor: 1.816.