A comparative approach of machine learning models to predict attrition in a diabetes management program.

IF 7.7
PLOS digital health Pub Date : 2025-07-07 eCollection Date: 2025-07-01 DOI:10.1371/journal.pdig.0000930
Samantha Kanny, Grisha Post, Patricia Carbajales-Dale, William Cummings, Janet Evatt, Windsor Westbrook Sherrill
{"title":"A comparative approach of machine learning models to predict attrition in a diabetes management program.","authors":"Samantha Kanny, Grisha Post, Patricia Carbajales-Dale, William Cummings, Janet Evatt, Windsor Westbrook Sherrill","doi":"10.1371/journal.pdig.0000930","DOIUrl":null,"url":null,"abstract":"<p><p>Approximately 11.6% of Americans have diabetes and South Carolina has one of the highest rates of adults with diabetes. Diabetes self-management programs have been observed to be effective in promoting weight loss and improving diabetes knowledge and self-care behaviors. The ability to keep vulnerable individuals in these programs is critical to helping the growing diabetic population. Utilizing machine learning is gaining popularity in healthcare settings. The objective of this study is to assess the effectiveness of several machine learning methods in predicting attrition from a diabetes self-management program, utilizing participant demographics and various evaluation measures. Data were collected from participants enrolled in Health Extension for Diabetes (HED). Descriptive statistics were used to examine HED participant demographics, while Mann-Whitney U tests and chi-square tests were used to examine relationships between demographics and pre-program evaluation measures. Through the various analyses, health-related measures - specifically the SF-12 quality of life scores, Distressed Communities Index (DCI) score, along with demographic factors (race, age, height, and educational attainment), and spatial variables (drive time to the nearest grocery store) emerged as influential predictors of attrition. However, the machine learning models showed poor overall performance, with AUC values ranging from 0.53 - 0.64 and F-1 scores between 0.19 - 0.36, indicating low predictive power. Among the models tested, XGBoost with downsampling yielded the highest AUC value (0.64) and a slightly higher F-1 score (0.36). To enhance model interpretability, SHAP (SHapley Additive exPlanations) was applied. While these models are not suitable for accurately predicting individual attrition risk in diabetes self-management programs, they identify potential factors influencing dropout rates. These findings underscore the difficulty for models to accurately predict health behavior outcomes, highlighting the need for future research to improve predictive modeling to better support patient engagement and retention.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 7","pages":"e0000930"},"PeriodicalIF":7.7000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12233248/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Approximately 11.6% of Americans have diabetes and South Carolina has one of the highest rates of adults with diabetes. Diabetes self-management programs have been observed to be effective in promoting weight loss and improving diabetes knowledge and self-care behaviors. The ability to keep vulnerable individuals in these programs is critical to helping the growing diabetic population. Utilizing machine learning is gaining popularity in healthcare settings. The objective of this study is to assess the effectiveness of several machine learning methods in predicting attrition from a diabetes self-management program, utilizing participant demographics and various evaluation measures. Data were collected from participants enrolled in Health Extension for Diabetes (HED). Descriptive statistics were used to examine HED participant demographics, while Mann-Whitney U tests and chi-square tests were used to examine relationships between demographics and pre-program evaluation measures. Through the various analyses, health-related measures - specifically the SF-12 quality of life scores, Distressed Communities Index (DCI) score, along with demographic factors (race, age, height, and educational attainment), and spatial variables (drive time to the nearest grocery store) emerged as influential predictors of attrition. However, the machine learning models showed poor overall performance, with AUC values ranging from 0.53 - 0.64 and F-1 scores between 0.19 - 0.36, indicating low predictive power. Among the models tested, XGBoost with downsampling yielded the highest AUC value (0.64) and a slightly higher F-1 score (0.36). To enhance model interpretability, SHAP (SHapley Additive exPlanations) was applied. While these models are not suitable for accurately predicting individual attrition risk in diabetes self-management programs, they identify potential factors influencing dropout rates. These findings underscore the difficulty for models to accurately predict health behavior outcomes, highlighting the need for future research to improve predictive modeling to better support patient engagement and retention.

Abstract Image

Abstract Image

Abstract Image

机器学习模型预测糖尿病管理项目人员流失的比较方法。
大约11.6%的美国人患有糖尿病,南卡罗来纳州是成人糖尿病发病率最高的州之一。糖尿病自我管理项目已被观察到在促进体重减轻和提高糖尿病知识和自我护理行为方面是有效的。保持弱势个体参与这些项目的能力对于帮助不断增长的糖尿病人群至关重要。利用机器学习在医疗环境中越来越受欢迎。本研究的目的是评估几种机器学习方法在预测糖尿病自我管理计划的减员方面的有效性,利用参与者人口统计数据和各种评估措施。数据收集自参加糖尿病健康扩展(HED)的参与者。使用描述性统计来检验HED参与者的人口统计数据,而使用Mann-Whitney U检验和卡方检验来检验人口统计数据与计划前评估措施之间的关系。通过各种分析,与健康相关的措施-特别是SF-12生活质量分数,贫困社区指数(DCI)分数,以及人口因素(种族,年龄,身高和受教育程度)和空间变量(开车到最近的杂货店的时间)成为有影响的损耗预测因素。然而,机器学习模型整体表现不佳,AUC值在0.53 - 0.64之间,F-1得分在0.19 - 0.36之间,表明预测能力较低。在测试的模型中,下采样的XGBoost的AUC值最高(0.64),F-1分数略高(0.36)。为了提高模型的可解释性,采用了SHapley加性解释(SHapley Additive explanatory)。虽然这些模型不适合准确预测糖尿病自我管理项目中的个体流失风险,但它们确定了影响辍学率的潜在因素。这些发现强调了模型准确预测健康行为结果的困难,强调了未来研究改进预测模型以更好地支持患者参与和保留的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信