{"title":"开发和验证用于在血糖正常情况下识别糖尿病前期和糖尿病的机器学习模型。","authors":"Xiaodong Zhang, Weidong Yao, Dawei Wang, Wenqi Hu, Guang Zhang, Yongsheng Zhang","doi":"10.1002/dmrr.70003","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Prediabetes and diabetes are both abnormal states of glucose metabolism (AGM) that can lead to severe complications. Early detection of AGM is crucial for timely intervention and treatment. However, fasting blood glucose (FBG) as a mass population screening method may fail to identify some individuals who are actually AGM but with normoglycemia. This study aimed to develop and validate machine learning (ML) models to identify AGM among individuals with normoglycemia using routine health check-up indicators.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>According to the American Diabetes Association (ADA) criteria, participants with normoglycemia (FBG ≤ 5.6 mmol/L) were collected from 2019 to 2023, and then divided into AGM and Normal groups using glycosylated haemoglobin (HbA1c) 5.7% as the threshold. Data from 2019 to 2022 were divided into training and internal validation sets at a 7:3 ratio, while data from 2023 were used as the external validation set. Seven ML algorithms—including logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting machine, multilayer perceptron (MLP), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost)—were used to build models for identifying AGM in normoglycemia population. Model performance was evaluated using the area under the receiver operating characteristic curve (auROC) and the precision-recall curve (auPR). The feature contributions to the optimal model was visualised using the SHapley Additive exPlanations (SHAP). Finally, an intuitive and user-friendly interactive interface was developed.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>A total of 59,259 participants were finally enroled in this study, and then divided into the training set of 32,810, the internal validation set of 14,060, and the external validation set of 12,389. The Catboost model outperformed the others with auROC of 0.806 and 0.794 for the internal and external validation set, respectively. Age was the most important feature influencing the performance of the CatBoost model, followed by fasting blood glucose, red blood cells, haemoglobin, body mass index, and triglyceride-glucose.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>A well-performed ML model to identify AGM in the normoglycemia population was built, offering significant potential for early intervention and treatment of AGM that would otherwise have been missed.</p>\n </section>\n </div>","PeriodicalId":11335,"journal":{"name":"Diabetes/Metabolism Research and Reviews","volume":"40 8","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/dmrr.70003","citationCount":"0","resultStr":"{\"title\":\"Development and Validation of Machine Learning Models for Identifying Prediabetes and Diabetes in Normoglycemia\",\"authors\":\"Xiaodong Zhang, Weidong Yao, Dawei Wang, Wenqi Hu, Guang Zhang, Yongsheng Zhang\",\"doi\":\"10.1002/dmrr.70003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Prediabetes and diabetes are both abnormal states of glucose metabolism (AGM) that can lead to severe complications. Early detection of AGM is crucial for timely intervention and treatment. However, fasting blood glucose (FBG) as a mass population screening method may fail to identify some individuals who are actually AGM but with normoglycemia. This study aimed to develop and validate machine learning (ML) models to identify AGM among individuals with normoglycemia using routine health check-up indicators.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>According to the American Diabetes Association (ADA) criteria, participants with normoglycemia (FBG ≤ 5.6 mmol/L) were collected from 2019 to 2023, and then divided into AGM and Normal groups using glycosylated haemoglobin (HbA1c) 5.7% as the threshold. Data from 2019 to 2022 were divided into training and internal validation sets at a 7:3 ratio, while data from 2023 were used as the external validation set. Seven ML algorithms—including logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting machine, multilayer perceptron (MLP), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost)—were used to build models for identifying AGM in normoglycemia population. Model performance was evaluated using the area under the receiver operating characteristic curve (auROC) and the precision-recall curve (auPR). The feature contributions to the optimal model was visualised using the SHapley Additive exPlanations (SHAP). Finally, an intuitive and user-friendly interactive interface was developed.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>A total of 59,259 participants were finally enroled in this study, and then divided into the training set of 32,810, the internal validation set of 14,060, and the external validation set of 12,389. The Catboost model outperformed the others with auROC of 0.806 and 0.794 for the internal and external validation set, respectively. Age was the most important feature influencing the performance of the CatBoost model, followed by fasting blood glucose, red blood cells, haemoglobin, body mass index, and triglyceride-glucose.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusion</h3>\\n \\n <p>A well-performed ML model to identify AGM in the normoglycemia population was built, offering significant potential for early intervention and treatment of AGM that would otherwise have been missed.</p>\\n </section>\\n </div>\",\"PeriodicalId\":11335,\"journal\":{\"name\":\"Diabetes/Metabolism Research and Reviews\",\"volume\":\"40 8\",\"pages\":\"\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/dmrr.70003\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diabetes/Metabolism Research and Reviews\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/dmrr.70003\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diabetes/Metabolism Research and Reviews","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/dmrr.70003","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
Development and Validation of Machine Learning Models for Identifying Prediabetes and Diabetes in Normoglycemia
Background
Prediabetes and diabetes are both abnormal states of glucose metabolism (AGM) that can lead to severe complications. Early detection of AGM is crucial for timely intervention and treatment. However, fasting blood glucose (FBG) as a mass population screening method may fail to identify some individuals who are actually AGM but with normoglycemia. This study aimed to develop and validate machine learning (ML) models to identify AGM among individuals with normoglycemia using routine health check-up indicators.
Methods
According to the American Diabetes Association (ADA) criteria, participants with normoglycemia (FBG ≤ 5.6 mmol/L) were collected from 2019 to 2023, and then divided into AGM and Normal groups using glycosylated haemoglobin (HbA1c) 5.7% as the threshold. Data from 2019 to 2022 were divided into training and internal validation sets at a 7:3 ratio, while data from 2023 were used as the external validation set. Seven ML algorithms—including logistic regression (LR), random forest (RF), support vector machine (SVM), extreme gradient boosting machine, multilayer perceptron (MLP), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost)—were used to build models for identifying AGM in normoglycemia population. Model performance was evaluated using the area under the receiver operating characteristic curve (auROC) and the precision-recall curve (auPR). The feature contributions to the optimal model was visualised using the SHapley Additive exPlanations (SHAP). Finally, an intuitive and user-friendly interactive interface was developed.
Results
A total of 59,259 participants were finally enroled in this study, and then divided into the training set of 32,810, the internal validation set of 14,060, and the external validation set of 12,389. The Catboost model outperformed the others with auROC of 0.806 and 0.794 for the internal and external validation set, respectively. Age was the most important feature influencing the performance of the CatBoost model, followed by fasting blood glucose, red blood cells, haemoglobin, body mass index, and triglyceride-glucose.
Conclusion
A well-performed ML model to identify AGM in the normoglycemia population was built, offering significant potential for early intervention and treatment of AGM that would otherwise have been missed.
期刊介绍:
Diabetes/Metabolism Research and Reviews is a premier endocrinology and metabolism journal esteemed by clinicians and researchers alike. Encompassing a wide spectrum of topics including diabetes, endocrinology, metabolism, and obesity, the journal eagerly accepts submissions ranging from clinical studies to basic and translational research, as well as reviews exploring historical progress, controversial issues, and prominent opinions in the field. Join us in advancing knowledge and understanding in the realm of diabetes and metabolism.