{"title":"利用机器学习方法判别中年非糖尿病妇女的胰岛素抵抗。","authors":"Zailing Xing, Henian Chen, Amy C Alman","doi":"10.3934/publichealth.2024034","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>We employed machine learning algorithms to discriminate insulin resistance (IR) in middle-aged nondiabetic women.</p><p><strong>Methods: </strong>The data was from the National Health and Nutrition Examination Survey (2007-2018). The study subjects were 2084 nondiabetic women aged 45-64. The analysis included 48 predictors. We randomly divided the data into training (n = 1667) and testing (n = 417) datasets. Four machine learning techniques were employed to discriminate IR: extreme gradient boosting (XGBoosting), random forest (RF), gradient boosting machine (GBM), and decision tree (DT). The area under the curve (AUC) of receiver operating characteristic (ROC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were compared as performance metrics to select the optimal technique.</p><p><strong>Results: </strong>The XGBoosting algorithm achieved a relatively high AUC of 0.93 in the training dataset and 0.86 in the testing dataset to discriminate IR using 48 predictors and was followed by the RF, GBM, and DT models. After selecting the top five predictors to build models, the XGBoost algorithm with the AUC of 0.90 (training dataset) and 0.86 (testing dataset) remained the optimal prediction model. The SHapley Additive exPlanations (SHAP) values revealed the associations between the five predictors and IR, namely BMI (strongly positive impact on IR), fasting glucose (strongly positive), HDL-C (medium negative), triglycerides (medium positive), and glycohemoglobin (medium positive). The threshold values for identifying IR were 29 kg/m<sup>2</sup>, 100 mg/dL, 54.5 mg/dL, 89 mg/dL, and 5.6% for BMI, glucose, HDL-C, triglycerides, and glycohemoglobin, respectively.</p><p><strong>Conclusion: </strong>The XGBoosting algorithm demonstrated superior performance metrics for discriminating IR in middle-aged nondiabetic women, with BMI, glucose, HDL-C, glycohemoglobin, and triglycerides as the top five predictors.</p>","PeriodicalId":45684,"journal":{"name":"AIMS Public Health","volume":"11 2","pages":"667-687"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11252584/pdf/","citationCount":"0","resultStr":"{\"title\":\"Discriminating insulin resistance in middle-aged nondiabetic women using machine learning approaches.\",\"authors\":\"Zailing Xing, Henian Chen, Amy C Alman\",\"doi\":\"10.3934/publichealth.2024034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>We employed machine learning algorithms to discriminate insulin resistance (IR) in middle-aged nondiabetic women.</p><p><strong>Methods: </strong>The data was from the National Health and Nutrition Examination Survey (2007-2018). The study subjects were 2084 nondiabetic women aged 45-64. The analysis included 48 predictors. We randomly divided the data into training (n = 1667) and testing (n = 417) datasets. Four machine learning techniques were employed to discriminate IR: extreme gradient boosting (XGBoosting), random forest (RF), gradient boosting machine (GBM), and decision tree (DT). The area under the curve (AUC) of receiver operating characteristic (ROC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were compared as performance metrics to select the optimal technique.</p><p><strong>Results: </strong>The XGBoosting algorithm achieved a relatively high AUC of 0.93 in the training dataset and 0.86 in the testing dataset to discriminate IR using 48 predictors and was followed by the RF, GBM, and DT models. After selecting the top five predictors to build models, the XGBoost algorithm with the AUC of 0.90 (training dataset) and 0.86 (testing dataset) remained the optimal prediction model. The SHapley Additive exPlanations (SHAP) values revealed the associations between the five predictors and IR, namely BMI (strongly positive impact on IR), fasting glucose (strongly positive), HDL-C (medium negative), triglycerides (medium positive), and glycohemoglobin (medium positive). The threshold values for identifying IR were 29 kg/m<sup>2</sup>, 100 mg/dL, 54.5 mg/dL, 89 mg/dL, and 5.6% for BMI, glucose, HDL-C, triglycerides, and glycohemoglobin, respectively.</p><p><strong>Conclusion: </strong>The XGBoosting algorithm demonstrated superior performance metrics for discriminating IR in middle-aged nondiabetic women, with BMI, glucose, HDL-C, glycohemoglobin, and triglycerides as the top five predictors.</p>\",\"PeriodicalId\":45684,\"journal\":{\"name\":\"AIMS Public Health\",\"volume\":\"11 2\",\"pages\":\"667-687\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11252584/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AIMS Public Health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3934/publichealth.2024034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AIMS Public Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/publichealth.2024034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
摘要
目的:我们采用机器学习算法来判别非糖尿病中年女性的胰岛素抵抗(IR):我们采用机器学习算法来判别中年非糖尿病女性的胰岛素抵抗(IR):数据来自美国国家健康与营养调查(2007-2018 年)。研究对象为 2084 名 45-64 岁的非糖尿病女性。分析包括 48 个预测因子。我们将数据随机分为训练数据集(n = 1667)和测试数据集(n = 417)。我们采用了四种机器学习技术来判别 IR:极端梯度提升(XGBoosting)、随机森林(RF)、梯度提升机(GBM)和决策树(DT)。比较了接收者操作特征曲线下面积(AUC)、准确率、灵敏度、特异性、阳性预测值、阴性预测值和 F1 分数等性能指标,以选择最佳技术:XGBoosting算法使用48个预测因子对IR进行判别,在训练数据集和测试数据集上的AUC分别达到了0.93和0.86,相对较高,其次是RF、GBM和DT模型。在选择前五个预测因子建立模型后,XGBoost 算法的 AUC 为 0.90(训练数据集)和 0.86(测试数据集),仍然是最佳预测模型。SHapley Additive exPlanations(SHAP)值揭示了五个预测因子与 IR 之间的关联,即体重指数(对 IR 有强烈的正向影响)、空腹血糖(强烈的正向影响)、高密度脂蛋白胆固醇(中度负向影响)、甘油三酯(中度正向影响)和糖化血红蛋白(中度正向影响)。BMI、血糖、HDL-C、甘油三酯和糖化血红蛋白识别 IR 的阈值分别为 29 kg/m2、100 mg/dL、54.5 mg/dL、89 mg/dL 和 5.6%:XGBoosting算法在判别中年非糖尿病女性的红外方面表现出卓越的性能指标,BMI、血糖、HDL-C、甘油三酯和甘油三酯是前五大预测指标。
Discriminating insulin resistance in middle-aged nondiabetic women using machine learning approaches.
Objective: We employed machine learning algorithms to discriminate insulin resistance (IR) in middle-aged nondiabetic women.
Methods: The data was from the National Health and Nutrition Examination Survey (2007-2018). The study subjects were 2084 nondiabetic women aged 45-64. The analysis included 48 predictors. We randomly divided the data into training (n = 1667) and testing (n = 417) datasets. Four machine learning techniques were employed to discriminate IR: extreme gradient boosting (XGBoosting), random forest (RF), gradient boosting machine (GBM), and decision tree (DT). The area under the curve (AUC) of receiver operating characteristic (ROC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were compared as performance metrics to select the optimal technique.
Results: The XGBoosting algorithm achieved a relatively high AUC of 0.93 in the training dataset and 0.86 in the testing dataset to discriminate IR using 48 predictors and was followed by the RF, GBM, and DT models. After selecting the top five predictors to build models, the XGBoost algorithm with the AUC of 0.90 (training dataset) and 0.86 (testing dataset) remained the optimal prediction model. The SHapley Additive exPlanations (SHAP) values revealed the associations between the five predictors and IR, namely BMI (strongly positive impact on IR), fasting glucose (strongly positive), HDL-C (medium negative), triglycerides (medium positive), and glycohemoglobin (medium positive). The threshold values for identifying IR were 29 kg/m2, 100 mg/dL, 54.5 mg/dL, 89 mg/dL, and 5.6% for BMI, glucose, HDL-C, triglycerides, and glycohemoglobin, respectively.
Conclusion: The XGBoosting algorithm demonstrated superior performance metrics for discriminating IR in middle-aged nondiabetic women, with BMI, glucose, HDL-C, glycohemoglobin, and triglycerides as the top five predictors.