在美国服务不足的社区中,预测糖尿病前期向糖尿病过渡的因素评估——一种机器学习方法

IF 6.3 2区 医学 Q1 BIOLOGY
Arinze Nkemdirim Okere , Tianfeng Li , Carlos Theran , Eunice Nyasani , Askal Ayalew Ali
{"title":"在美国服务不足的社区中,预测糖尿病前期向糖尿病过渡的因素评估——一种机器学习方法","authors":"Arinze Nkemdirim Okere ,&nbsp;Tianfeng Li ,&nbsp;Carlos Theran ,&nbsp;Eunice Nyasani ,&nbsp;Askal Ayalew Ali","doi":"10.1016/j.compbiomed.2025.109824","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Over one-third of the population in the United States (US) has prediabetes. Unfortunately, underserved population in the United States face a higher burden of prediabetes compared to urban areas, increasing the risk of stroke and heart disease. There is a gap in the literature in understanding early predictors of diabetes among patients with prediabetes living in underserved communities in the United States. Hence, this study's objective is to identify factors influencing the transition from prediabetes to diabetes in rural or underserved communities using a machine learning approach.</div></div><div><h3>Methods</h3><div>We conducted a retrospective analysis of data from prediabetic patients between 2012 and 2022. Eligible participants were at least 18 years old with baseline HbA1c levels between 5.7 % and 6.4 %. Eleven machine learning algorithms were evaluated using ten-fold cross-validation, including Logistic Regression (LR), Support Vector Classifier (SVC), K-nearest Neighbor (KNN), Gaussian Naive Bayes (GaussianNB), Bernoulli Naive Bayes (BernoulliNB), Adaptive Boosting (AdaBoost), Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), and Extra Trees (ET). Subsequently, the SHAP framework was used to assess predictor influence and interactions observed with the top model.</div></div><div><h3>Results</h3><div>Out of 5816 patients, 1910 met the criteria, with 426 progressing to diabetes. The Random Forest model achieved the highest accuracy (90.0 %) and AUC (0.963), followed by Extra Trees (89.5 % accuracy, AUC 0.962) and XGBoost (88.6 % accuracy, AUC 0.952). Logistic Regression demonstrated lower performance but outperformed other models such as K-Nearest Neighbors and Gaussian Naive Bayes. SHAP analysis with the RF model identified key predictors and their interactions. A significant interaction showed that lower BMI values, combined with increasing age, were associated with a reduced risk of diabetes progression, while higher BMI at younger ages increased the likelihood of progression. Additionally, several social determinants of health were identified as significant predictors.</div></div><div><h3>Conclusion</h3><div>Among the 11 models, the Random Forest model showed the strongest reliability for predicting diabetes progression. The results of this study can be used to inform public policy implications for the development of early, targeted interventions focusing on social determinants of health, dietary counseling, and BMI management to prevent diabetes in underserved communities.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"187 ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of factors predicting transition from prediabetes to diabetes among patients residing in underserved communities in the United States – A machine learning approach\",\"authors\":\"Arinze Nkemdirim Okere ,&nbsp;Tianfeng Li ,&nbsp;Carlos Theran ,&nbsp;Eunice Nyasani ,&nbsp;Askal Ayalew Ali\",\"doi\":\"10.1016/j.compbiomed.2025.109824\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Over one-third of the population in the United States (US) has prediabetes. Unfortunately, underserved population in the United States face a higher burden of prediabetes compared to urban areas, increasing the risk of stroke and heart disease. There is a gap in the literature in understanding early predictors of diabetes among patients with prediabetes living in underserved communities in the United States. Hence, this study's objective is to identify factors influencing the transition from prediabetes to diabetes in rural or underserved communities using a machine learning approach.</div></div><div><h3>Methods</h3><div>We conducted a retrospective analysis of data from prediabetic patients between 2012 and 2022. Eligible participants were at least 18 years old with baseline HbA1c levels between 5.7 % and 6.4 %. Eleven machine learning algorithms were evaluated using ten-fold cross-validation, including Logistic Regression (LR), Support Vector Classifier (SVC), K-nearest Neighbor (KNN), Gaussian Naive Bayes (GaussianNB), Bernoulli Naive Bayes (BernoulliNB), Adaptive Boosting (AdaBoost), Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), and Extra Trees (ET). Subsequently, the SHAP framework was used to assess predictor influence and interactions observed with the top model.</div></div><div><h3>Results</h3><div>Out of 5816 patients, 1910 met the criteria, with 426 progressing to diabetes. The Random Forest model achieved the highest accuracy (90.0 %) and AUC (0.963), followed by Extra Trees (89.5 % accuracy, AUC 0.962) and XGBoost (88.6 % accuracy, AUC 0.952). Logistic Regression demonstrated lower performance but outperformed other models such as K-Nearest Neighbors and Gaussian Naive Bayes. SHAP analysis with the RF model identified key predictors and their interactions. A significant interaction showed that lower BMI values, combined with increasing age, were associated with a reduced risk of diabetes progression, while higher BMI at younger ages increased the likelihood of progression. Additionally, several social determinants of health were identified as significant predictors.</div></div><div><h3>Conclusion</h3><div>Among the 11 models, the Random Forest model showed the strongest reliability for predicting diabetes progression. The results of this study can be used to inform public policy implications for the development of early, targeted interventions focusing on social determinants of health, dietary counseling, and BMI management to prevent diabetes in underserved communities.</div></div>\",\"PeriodicalId\":10578,\"journal\":{\"name\":\"Computers in biology and medicine\",\"volume\":\"187 \",\"pages\":\"\"},\"PeriodicalIF\":6.3000,\"publicationDate\":\"2025-02-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in biology and medicine\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S001048252500174X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S001048252500174X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

在美国,超过三分之一的人口患有前驱糖尿病。不幸的是,与城市地区相比,美国服务不足的人口面临着更高的前驱糖尿病负担,增加了中风和心脏病的风险。在了解生活在美国服务不足社区的糖尿病前期患者的糖尿病早期预测因素方面,文献存在空白。因此,本研究的目的是利用机器学习方法确定影响农村或服务不足社区从糖尿病前期向糖尿病过渡的因素。方法回顾性分析2012 - 2022年糖尿病前期患者的数据。符合条件的参与者至少18岁,基线HbA1c水平在5.7%至6.4%之间。采用十倍交叉验证对11种机器学习算法进行了评估,包括逻辑回归(LR)、支持向量分类器(SVC)、k近邻(KNN)、高斯朴素贝叶斯(GaussianNB)、伯努利朴素贝叶斯(BernoulliNB)、自适应增强(AdaBoost)、决策树(DT)、随机森林(RF)、梯度增强(GB)、极端梯度增强(XGBoost)和额外树(ET)。随后,SHAP框架被用来评估预测器的影响和与顶级模型观察到的相互作用。结果在5816例患者中,1910例符合标准,426例进展为糖尿病。Random Forest模型的准确率最高(90.0%),AUC最高(0.963),其次是Extra Trees(准确率89.5%,AUC 0.962)和XGBoost(准确率88.6%,AUC 0.952)。逻辑回归表现出较低的性能,但优于其他模型,如k近邻和高斯朴素贝叶斯。使用RF模型的SHAP分析确定了关键预测因子及其相互作用。一个重要的相互作用表明,较低的BMI值,加上年龄的增长,与糖尿病进展的风险降低有关,而年轻时较高的BMI增加了进展的可能性。此外,健康的几个社会决定因素被确定为重要的预测因素。结论在11个模型中,随机森林模型预测糖尿病进展的可靠性最强。本研究的结果可用于为公共政策提供信息,以制定早期有针对性的干预措施,重点关注健康的社会决定因素、饮食咨询和BMI管理,以预防服务不足社区的糖尿病。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of factors predicting transition from prediabetes to diabetes among patients residing in underserved communities in the United States – A machine learning approach

Introduction

Over one-third of the population in the United States (US) has prediabetes. Unfortunately, underserved population in the United States face a higher burden of prediabetes compared to urban areas, increasing the risk of stroke and heart disease. There is a gap in the literature in understanding early predictors of diabetes among patients with prediabetes living in underserved communities in the United States. Hence, this study's objective is to identify factors influencing the transition from prediabetes to diabetes in rural or underserved communities using a machine learning approach.

Methods

We conducted a retrospective analysis of data from prediabetic patients between 2012 and 2022. Eligible participants were at least 18 years old with baseline HbA1c levels between 5.7 % and 6.4 %. Eleven machine learning algorithms were evaluated using ten-fold cross-validation, including Logistic Regression (LR), Support Vector Classifier (SVC), K-nearest Neighbor (KNN), Gaussian Naive Bayes (GaussianNB), Bernoulli Naive Bayes (BernoulliNB), Adaptive Boosting (AdaBoost), Decision Tree (DT), Random Forest (RF), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), and Extra Trees (ET). Subsequently, the SHAP framework was used to assess predictor influence and interactions observed with the top model.

Results

Out of 5816 patients, 1910 met the criteria, with 426 progressing to diabetes. The Random Forest model achieved the highest accuracy (90.0 %) and AUC (0.963), followed by Extra Trees (89.5 % accuracy, AUC 0.962) and XGBoost (88.6 % accuracy, AUC 0.952). Logistic Regression demonstrated lower performance but outperformed other models such as K-Nearest Neighbors and Gaussian Naive Bayes. SHAP analysis with the RF model identified key predictors and their interactions. A significant interaction showed that lower BMI values, combined with increasing age, were associated with a reduced risk of diabetes progression, while higher BMI at younger ages increased the likelihood of progression. Additionally, several social determinants of health were identified as significant predictors.

Conclusion

Among the 11 models, the Random Forest model showed the strongest reliability for predicting diabetes progression. The results of this study can be used to inform public policy implications for the development of early, targeted interventions focusing on social determinants of health, dietary counseling, and BMI management to prevent diabetes in underserved communities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers in biology and medicine
Computers in biology and medicine 工程技术-工程:生物医学
CiteScore
11.70
自引率
10.40%
发文量
1086
审稿时长
74 days
期刊介绍: Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信