A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients

IF 2.4 Q3 ENDOCRINOLOGY & METABOLISM
S. Cichosz, Clara Bender, Ole Hejlesen
{"title":"A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients","authors":"S. Cichosz, Clara Bender, Ole Hejlesen","doi":"10.3390/diabetology5010001","DOIUrl":null,"url":null,"abstract":"Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.","PeriodicalId":72798,"journal":{"name":"Diabetology","volume":"24 21","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/diabetology5010001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.
用于检测未确诊糖尿病患者的机器学习模型比较分析
简介早期发现 2 型糖尿病对预防长期并发症至关重要。然而,对所有人群进行糖尿病筛查并不划算,因此识别高危人群至关重要。本研究旨在比较五种不同的机器学习(ML)模型在使用大型异构数据集对未确诊糖尿病进行分类时的性能。研究方法我们使用从 2005 年到 2018 年连续几年的美国国家健康与营养调查(NHANES)的机器学习数据来识别未确诊的糖尿病患者。数据集包括 45,431 名参与者,使用葡萄糖控制的生化确认(HbA1c)来识别未确诊的糖尿病患者。预测因子基于简单且临床上可获得的变量,可用于糖尿病的预筛查。我们将随机森林、AdaBoost、RUSBoost、LogitBoost 和神经网络等五种 ML 模型进行了比较。结果未确诊糖尿病的发病率为 4%。在对未确诊糖尿病进行分类时,ROC 曲线下面积(AUC)值介于 0.776 和 0.806 之间。阳性预测值(PPV)介于 0.083 和 0.091 之间,阴性预测值(NPV)介于 0.984 和 0.99 之间,灵敏度介于 0.742 和 0.871 之间。结论我们已经证明,几种类型的分类模型可以从简单的、临床上可获得的变量中准确地对未确诊的糖尿病进行分类。这些结果表明,在临床实践中,使用机器学习预检未确诊糖尿病可能是一种有用的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.50
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信