使用LightGBM预测非糖尿病人群胰岛素抵抗及其临床价值的队列验证：横断面和回顾性队列研究。

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-06-13 DOI:10.2196/72238

Ting Peng, Rujia Miao, Hao Xiong, Yanhui Lin, Duzhen Fan, Jiayi Ren, Jiangang Wang, Yuan Li, Jianwen Chen

{"title":"使用LightGBM预测非糖尿病人群胰岛素抵抗及其临床价值的队列验证：横断面和回顾性队列研究。","authors":"Ting Peng, Rujia Miao, Hao Xiong, Yanhui Lin, Duzhen Fan, Jiayi Ren, Jiangang Wang, Yuan Li, Jianwen Chen","doi":"10.2196/72238","DOIUrl":null,"url":null,"abstract":"Background: Insulin resistance (IR), a precursor to type 2 diabetes and a major risk factor for various chronic diseases, is becoming increasingly prevalent in China due to population aging and unhealthy lifestyles. Current methods like the gold-standard hyperinsulinemic-euglycemic clamp has limitations in practical application. The development of more convenient and efficient methods to predict and manage IR in nondiabetic populations will have prevention and control value.Objective: This study aimed to develop and validate a machine learning prediction model for IR in a nondiabetic population, using low-cost diagnostic indicators and questionnaire surveys.Methods: A cross-sectional study was conducted for model development, and a retrospective cohort study was used for validation. Data from 17,287 adults with normal fasting blood glucose who underwent physical exams and completed surveys at the Health Management Center of Xiangya Third Hospital, Central South University, from January 2018 to August 2022, were analyzed. IR was assessed using the Homeostasis Model Assessment (HOMA-IR) method. The dataset was split into 80% (13,128/16,411) training and 20% (32,83/16,411) testing. A total of 5 machine learning algorithms, namely random forest, Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting, Gradient Boosting Machine, and CatBoost were used. Model optimization included resampling, feature selection, and hyperparameter tuning. Performance was evaluated using F1-score, accuracy, sensitivity, specificity, area under the curve (AUC), and Kappa value. Shapley Additive Explanations analysis was used to assess feature importance. For clinical implication investigation, a different retrospective cohort of 20,369 nondiabetic participants (from the Xiangya Third Hospital database between January 2017 and January 2019) was used for time-to-event analysis with Kaplan-Meier survival curves.Results: Data from 16,411 nondiabetic individuals were analyzed. We randomly selected 13,128 participants for the training group, and 3283 participants for the validation group. The final model included 34 lifestyle-related questionnaire features and 17 biochemical markers. In the validation group, their AUC were all greater than 0.90. In the test group, all AUC were also greater than 0.80. The LightGBM model showed the best IR prediction performance with an accuracy of 0.7542, sensitivity of 0.6639, specificity of 0.7642, F1-score of 0.6748, Kappa value of 0.3741, and AUC of 0.8456. Top 10 features included BMI, fasting blood glucose, high-density lipoprotein cholesterol, triglycerides, creatinine, alanine aminotransferase, sex, total bilirubin, age, and albumin/globulin ratio. In the validation queue, all participants were separated into the high-risk IR group and the low-risk IR group according to the LightGBM algorithm. Out of 5101 high-risk IR participants, 235 (4.6%) developed diabetes, while 137 (0.9%) of 15,268 low-risk IR participants did. This resulted in a hazard ratio of 5.1, indicating a significantly higher risk for the high-risk IR group.Conclusions: By leveraging low-cost laboratory indicators and questionnaire data, the LightGBM model effectively predicts IR status in nondiabetic individuals, aiding in large-scale IR screening and diabetes prevention, and it may potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e72238"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180673/pdf/","citationCount":"0","resultStr":"{\"title\":\"Prediction of Insulin Resistance in Nondiabetic Population Using LightGBM and Cohort Validation of Its Clinical Value: Cross-Sectional and Retrospective Cohort Study.\",\"authors\":\"Ting Peng, Rujia Miao, Hao Xiong, Yanhui Lin, Duzhen Fan, Jiayi Ren, Jiangang Wang, Yuan Li, Jianwen Chen\",\"doi\":\"10.2196/72238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Insulin resistance (IR), a precursor to type 2 diabetes and a major risk factor for various chronic diseases, is becoming increasingly prevalent in China due to population aging and unhealthy lifestyles. Current methods like the gold-standard hyperinsulinemic-euglycemic clamp has limitations in practical application. The development of more convenient and efficient methods to predict and manage IR in nondiabetic populations will have prevention and control value.Objective: This study aimed to develop and validate a machine learning prediction model for IR in a nondiabetic population, using low-cost diagnostic indicators and questionnaire surveys.Methods: A cross-sectional study was conducted for model development, and a retrospective cohort study was used for validation. Data from 17,287 adults with normal fasting blood glucose who underwent physical exams and completed surveys at the Health Management Center of Xiangya Third Hospital, Central South University, from January 2018 to August 2022, were analyzed. IR was assessed using the Homeostasis Model Assessment (HOMA-IR) method. The dataset was split into 80% (13,128/16,411) training and 20% (32,83/16,411) testing. A total of 5 machine learning algorithms, namely random forest, Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting, Gradient Boosting Machine, and CatBoost were used. Model optimization included resampling, feature selection, and hyperparameter tuning. Performance was evaluated using F1-score, accuracy, sensitivity, specificity, area under the curve (AUC), and Kappa value. Shapley Additive Explanations analysis was used to assess feature importance. For clinical implication investigation, a different retrospective cohort of 20,369 nondiabetic participants (from the Xiangya Third Hospital database between January 2017 and January 2019) was used for time-to-event analysis with Kaplan-Meier survival curves.Results: Data from 16,411 nondiabetic individuals were analyzed. We randomly selected 13,128 participants for the training group, and 3283 participants for the validation group. The final model included 34 lifestyle-related questionnaire features and 17 biochemical markers. In the validation group, their AUC were all greater than 0.90. In the test group, all AUC were also greater than 0.80. The LightGBM model showed the best IR prediction performance with an accuracy of 0.7542, sensitivity of 0.6639, specificity of 0.7642, F1-score of 0.6748, Kappa value of 0.3741, and AUC of 0.8456. Top 10 features included BMI, fasting blood glucose, high-density lipoprotein cholesterol, triglycerides, creatinine, alanine aminotransferase, sex, total bilirubin, age, and albumin/globulin ratio. In the validation queue, all participants were separated into the high-risk IR group and the low-risk IR group according to the LightGBM algorithm. Out of 5101 high-risk IR participants, 235 (4.6%) developed diabetes, while 137 (0.9%) of 15,268 low-risk IR participants did. This resulted in a hazard ratio of 5.1, indicating a significantly higher risk for the high-risk IR group.Conclusions: By leveraging low-cost laboratory indicators and questionnaire data, the LightGBM model effectively predicts IR status in nondiabetic individuals, aiding in large-scale IR screening and diabetes prevention, and it may potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e72238\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12180673/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/72238\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/72238","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：由于人口老龄化和不健康的生活方式，胰岛素抵抗（Insulin resistance， IR）是2型糖尿病的前兆，也是多种慢性疾病的主要危险因素。目前的方法，如金标准的高胰岛素-血糖钳在实际应用中有局限性。开发更方便、更有效的方法来预测和管理非糖尿病人群的IR将具有预防和控制价值。目的：本研究旨在利用低成本诊断指标和问卷调查，开发并验证非糖尿病人群IR的机器学习预测模型。方法：采用横断面研究进行模型开发，采用回顾性队列研究进行验证。对2018年1月至2022年8月在中南大学湘雅第三医院健康管理中心接受体检并完成调查的17287名空腹血糖正常的成年人的数据进行分析。采用稳态模型评估（HOMA-IR）方法评估IR。数据集被分成80%（13,128/16,411）的训练和20%（32,83/16,411）的测试。共使用5种机器学习算法，分别是random forest、Light Gradient Boosting machine （LightGBM）、Extreme Gradient Boosting、Gradient Boosting machine和CatBoost。模型优化包括重采样、特征选择和超参数调优。使用f1评分、准确性、敏感性、特异性、曲线下面积（AUC）和Kappa值来评估疗效。采用Shapley加性解释分析评估特征重要性。临床意义研究采用Kaplan-Meier生存曲线对20,369名非糖尿病参与者（来自湘雅第三医院2017年1月至2019年1月的数据库）进行时间-事件分析。结果：分析了16,411名非糖尿病患者的数据。我们随机选择13128名参与者作为训练组，3283名参与者作为验证组。最终模型包括34个与生活方式相关的问卷特征和17个生化指标。验证组的AUC均大于0.90。试验组的AUC也均大于0.80。LightGBM模型的预测精度为0.7542，灵敏度为0.6639，特异性为0.7642，f1评分为0.6748，Kappa值为0.3741，AUC为0.8456。前10个特征包括BMI、空腹血糖、高密度脂蛋白胆固醇、甘油三酯、肌酐、丙氨酸转氨酶、性别、总胆红素、年龄和白蛋白/球蛋白比。在验证队列中，根据LightGBM算法将所有参与者分为高风险IR组和低风险IR组。在5101名高风险IR参与者中，235名（4.6%）患糖尿病，而15,268名低风险IR参与者中有137名（0.9%）患糖尿病。这导致风险比为5.1，表明高风险IR组的风险明显更高。结论：通过利用低成本的实验室指标和问卷调查数据，LightGBM模型可以有效地预测非糖尿病个体的IR状态，有助于大规模IR筛查和糖尿病预防，并且可能成为这些情况下胰岛素敏感性评估的有效实用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prediction of Insulin Resistance in Nondiabetic Population Using LightGBM and Cohort Validation of Its Clinical Value: Cross-Sectional and Retrospective Cohort Study.

Background: Insulin resistance (IR), a precursor to type 2 diabetes and a major risk factor for various chronic diseases, is becoming increasingly prevalent in China due to population aging and unhealthy lifestyles. Current methods like the gold-standard hyperinsulinemic-euglycemic clamp has limitations in practical application. The development of more convenient and efficient methods to predict and manage IR in nondiabetic populations will have prevention and control value.

Objective: This study aimed to develop and validate a machine learning prediction model for IR in a nondiabetic population, using low-cost diagnostic indicators and questionnaire surveys.

Methods: A cross-sectional study was conducted for model development, and a retrospective cohort study was used for validation. Data from 17,287 adults with normal fasting blood glucose who underwent physical exams and completed surveys at the Health Management Center of Xiangya Third Hospital, Central South University, from January 2018 to August 2022, were analyzed. IR was assessed using the Homeostasis Model Assessment (HOMA-IR) method. The dataset was split into 80% (13,128/16,411) training and 20% (32,83/16,411) testing. A total of 5 machine learning algorithms, namely random forest, Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting, Gradient Boosting Machine, and CatBoost were used. Model optimization included resampling, feature selection, and hyperparameter tuning. Performance was evaluated using F1-score, accuracy, sensitivity, specificity, area under the curve (AUC), and Kappa value. Shapley Additive Explanations analysis was used to assess feature importance. For clinical implication investigation, a different retrospective cohort of 20,369 nondiabetic participants (from the Xiangya Third Hospital database between January 2017 and January 2019) was used for time-to-event analysis with Kaplan-Meier survival curves.

Results: Data from 16,411 nondiabetic individuals were analyzed. We randomly selected 13,128 participants for the training group, and 3283 participants for the validation group. The final model included 34 lifestyle-related questionnaire features and 17 biochemical markers. In the validation group, their AUC were all greater than 0.90. In the test group, all AUC were also greater than 0.80. The LightGBM model showed the best IR prediction performance with an accuracy of 0.7542, sensitivity of 0.6639, specificity of 0.7642, F1-score of 0.6748, Kappa value of 0.3741, and AUC of 0.8456. Top 10 features included BMI, fasting blood glucose, high-density lipoprotein cholesterol, triglycerides, creatinine, alanine aminotransferase, sex, total bilirubin, age, and albumin/globulin ratio. In the validation queue, all participants were separated into the high-risk IR group and the low-risk IR group according to the LightGBM algorithm. Out of 5101 high-risk IR participants, 235 (4.6%) developed diabetes, while 137 (0.9%) of 15,268 low-risk IR participants did. This resulted in a hazard ratio of 5.1, indicating a significantly higher risk for the high-risk IR group.

Conclusions: By leveraging low-cost laboratory indicators and questionnaire data, the LightGBM model effectively predicts IR status in nondiabetic individuals, aiding in large-scale IR screening and diabetes prevention, and it may potentially become an efficient and practical tool for insulin sensitivity assessment in these settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.