A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients

IF 2.2 Q3 ENDOCRINOLOGY & METABOLISM

Diabetology Pub Date : 2024-01-03 DOI:10.3390/diabetology5010001

S. Cichosz, Clara Bender, Ole Hejlesen

{"title":"A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients","authors":"S. Cichosz, Clara Bender, Ole Hejlesen","doi":"10.3390/diabetology5010001","DOIUrl":null,"url":null,"abstract":"Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.","PeriodicalId":72798,"journal":{"name":"Diabetology","volume":"24 21","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/diabetology5010001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.

查看原文本刊更多论文

用于检测未确诊糖尿病患者的机器学习模型比较分析

简介早期发现 2 型糖尿病对预防长期并发症至关重要。然而，对所有人群进行糖尿病筛查并不划算，因此识别高危人群至关重要。本研究旨在比较五种不同的机器学习（ML）模型在使用大型异构数据集对未确诊糖尿病进行分类时的性能。研究方法我们使用从 2005 年到 2018 年连续几年的美国国家健康与营养调查（NHANES）的机器学习数据来识别未确诊的糖尿病患者。数据集包括 45,431 名参与者，使用葡萄糖控制的生化确认（HbA1c）来识别未确诊的糖尿病患者。预测因子基于简单且临床上可获得的变量，可用于糖尿病的预筛查。我们将随机森林、AdaBoost、RUSBoost、LogitBoost 和神经网络等五种 ML 模型进行了比较。结果未确诊糖尿病的发病率为 4%。在对未确诊糖尿病进行分类时，ROC 曲线下面积（AUC）值介于 0.776 和 0.806 之间。阳性预测值（PPV）介于 0.083 和 0.091 之间，阴性预测值（NPV）介于 0.984 和 0.99 之间，灵敏度介于 0.742 和 0.871 之间。结论我们已经证明，几种类型的分类模型可以从简单的、临床上可获得的变量中准确地对未确诊的糖尿病进行分类。这些结果表明，在临床实践中，使用机器学习预检未确诊糖尿病可能是一种有用的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Diabetology

CiteScore

2.50

自引率

0.00%

发文量