Yijian Ji, Hongyan Shang, Jing Yi, Wenhui Zang, Wenjun Cao
{"title":"Machine learning-based models to predict type 2 diabetes combined with coronary heart disease and feature analysis-based on interpretable SHAP.","authors":"Yijian Ji, Hongyan Shang, Jing Yi, Wenhui Zang, Wenjun Cao","doi":"10.1007/s00592-025-02496-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Type 2 diabetes and coronary heart disease exhibit heightened prevalence in the Chinese population, posing as leading causes of mortality. The combination of diabetes and coronary heart disease, due to its challenging diagnosis and poor prognosis, imposes a significant disease burden. In recent years, machine learning has frequently been employed in diagnostic applications within medical fields; however, predictive models for type 2 diabetes complicated by coronary heart disease have been confronted with issues such as lower predictive performance and interference from other comorbidities during prediction.</p><p><strong>Methods: </strong>This study enhances the predictive accuracy, sensitivity, specificity, F1 score, and AUC of models forecasting the coexistence of diabetes and coronary heart disease. We developed an advanced prediction model using XGBoost combined with SHAP for feature analysis. Through comparative feature selection, hyperparameter optimization, and computational efficiency analysis, we identified optimal conditions for model performance. External validation with independent datasets confirmed the model's robustness and generalizability, supporting its potential implementation in clinical practice.</p><p><strong>Results: </strong>This study compared three models-Random Forest, LightGBM, and XGBoost-and found that XGBoost exhibited superior performance in both efficacy and computational efficiency. The accuracy (Acc) of the XGBoost model was 0.8910, which improved to 0.8942 after hyperparameter tuning. External validation using datasets from Pingyang Hospital and Heji Hospital in Shanxi Province, China, yielded an AUC of 0.7897, demonstrating robust generalizability. By integrating SHAP (SHapley Additive exPlanations) for interpretability, our study identified bilirubin levels, basophil count, cholesterol levels, and age as key features for predicting the coexistence of type 2 diabetes mellitus (T2DM) and coronary heart disease (CHD). These findings are seamlessly consistent with the feature importance rankings determined by the XGBoost algorithm. The model demonstrates moderate predictive performance (AUC = 0.7879 in external validation) with practical interpretability, offering potential utility in improving diagnostic efficiency for T2DM-CHD comorbidity in resource-limited settings. However, its clinical implementation requires further validation in diverse populations.</p>","PeriodicalId":6921,"journal":{"name":"Acta Diabetologica","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Diabetologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00592-025-02496-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Type 2 diabetes and coronary heart disease exhibit heightened prevalence in the Chinese population, posing as leading causes of mortality. The combination of diabetes and coronary heart disease, due to its challenging diagnosis and poor prognosis, imposes a significant disease burden. In recent years, machine learning has frequently been employed in diagnostic applications within medical fields; however, predictive models for type 2 diabetes complicated by coronary heart disease have been confronted with issues such as lower predictive performance and interference from other comorbidities during prediction.
Methods: This study enhances the predictive accuracy, sensitivity, specificity, F1 score, and AUC of models forecasting the coexistence of diabetes and coronary heart disease. We developed an advanced prediction model using XGBoost combined with SHAP for feature analysis. Through comparative feature selection, hyperparameter optimization, and computational efficiency analysis, we identified optimal conditions for model performance. External validation with independent datasets confirmed the model's robustness and generalizability, supporting its potential implementation in clinical practice.
Results: This study compared three models-Random Forest, LightGBM, and XGBoost-and found that XGBoost exhibited superior performance in both efficacy and computational efficiency. The accuracy (Acc) of the XGBoost model was 0.8910, which improved to 0.8942 after hyperparameter tuning. External validation using datasets from Pingyang Hospital and Heji Hospital in Shanxi Province, China, yielded an AUC of 0.7897, demonstrating robust generalizability. By integrating SHAP (SHapley Additive exPlanations) for interpretability, our study identified bilirubin levels, basophil count, cholesterol levels, and age as key features for predicting the coexistence of type 2 diabetes mellitus (T2DM) and coronary heart disease (CHD). These findings are seamlessly consistent with the feature importance rankings determined by the XGBoost algorithm. The model demonstrates moderate predictive performance (AUC = 0.7879 in external validation) with practical interpretability, offering potential utility in improving diagnostic efficiency for T2DM-CHD comorbidity in resource-limited settings. However, its clinical implementation requires further validation in diverse populations.
期刊介绍:
Acta Diabetologica is a journal that publishes reports of experimental and clinical research on diabetes mellitus and related metabolic diseases. Original contributions on biochemical, physiological, pathophysiological and clinical aspects of research on diabetes and metabolic diseases are welcome. Reports are published in the form of original articles, short communications and letters to the editor. Invited reviews and editorials are also published. A Methodology forum, which publishes contributions on methodological aspects of diabetes in vivo and in vitro, is also available. The Editor-in-chief will be pleased to consider articles describing new techniques (e.g., new transplantation methods, metabolic models), of innovative importance in the field of diabetes/metabolism. Finally, workshop reports are also welcome in Acta Diabetologica.