Machine learning-based models to predict type 2 diabetes combined with coronary heart disease and feature analysis-based on interpretable SHAP.

IF 3.1 3区 医学 Q2 ENDOCRINOLOGY & METABOLISM
Yijian Ji, Hongyan Shang, Jing Yi, Wenhui Zang, Wenjun Cao
{"title":"Machine learning-based models to predict type 2 diabetes combined with coronary heart disease and feature analysis-based on interpretable SHAP.","authors":"Yijian Ji, Hongyan Shang, Jing Yi, Wenhui Zang, Wenjun Cao","doi":"10.1007/s00592-025-02496-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Type 2 diabetes and coronary heart disease exhibit heightened prevalence in the Chinese population, posing as leading causes of mortality. The combination of diabetes and coronary heart disease, due to its challenging diagnosis and poor prognosis, imposes a significant disease burden. In recent years, machine learning has frequently been employed in diagnostic applications within medical fields; however, predictive models for type 2 diabetes complicated by coronary heart disease have been confronted with issues such as lower predictive performance and interference from other comorbidities during prediction.</p><p><strong>Methods: </strong>This study enhances the predictive accuracy, sensitivity, specificity, F1 score, and AUC of models forecasting the coexistence of diabetes and coronary heart disease. We developed an advanced prediction model using XGBoost combined with SHAP for feature analysis. Through comparative feature selection, hyperparameter optimization, and computational efficiency analysis, we identified optimal conditions for model performance. External validation with independent datasets confirmed the model's robustness and generalizability, supporting its potential implementation in clinical practice.</p><p><strong>Results: </strong>This study compared three models-Random Forest, LightGBM, and XGBoost-and found that XGBoost exhibited superior performance in both efficacy and computational efficiency. The accuracy (Acc) of the XGBoost model was 0.8910, which improved to 0.8942 after hyperparameter tuning. External validation using datasets from Pingyang Hospital and Heji Hospital in Shanxi Province, China, yielded an AUC of 0.7897, demonstrating robust generalizability. By integrating SHAP (SHapley Additive exPlanations) for interpretability, our study identified bilirubin levels, basophil count, cholesterol levels, and age as key features for predicting the coexistence of type 2 diabetes mellitus (T2DM) and coronary heart disease (CHD). These findings are seamlessly consistent with the feature importance rankings determined by the XGBoost algorithm. The model demonstrates moderate predictive performance (AUC = 0.7879 in external validation) with practical interpretability, offering potential utility in improving diagnostic efficiency for T2DM-CHD comorbidity in resource-limited settings. However, its clinical implementation requires further validation in diverse populations.</p>","PeriodicalId":6921,"journal":{"name":"Acta Diabetologica","volume":" ","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Diabetologica","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00592-025-02496-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Type 2 diabetes and coronary heart disease exhibit heightened prevalence in the Chinese population, posing as leading causes of mortality. The combination of diabetes and coronary heart disease, due to its challenging diagnosis and poor prognosis, imposes a significant disease burden. In recent years, machine learning has frequently been employed in diagnostic applications within medical fields; however, predictive models for type 2 diabetes complicated by coronary heart disease have been confronted with issues such as lower predictive performance and interference from other comorbidities during prediction.

Methods: This study enhances the predictive accuracy, sensitivity, specificity, F1 score, and AUC of models forecasting the coexistence of diabetes and coronary heart disease. We developed an advanced prediction model using XGBoost combined with SHAP for feature analysis. Through comparative feature selection, hyperparameter optimization, and computational efficiency analysis, we identified optimal conditions for model performance. External validation with independent datasets confirmed the model's robustness and generalizability, supporting its potential implementation in clinical practice.

Results: This study compared three models-Random Forest, LightGBM, and XGBoost-and found that XGBoost exhibited superior performance in both efficacy and computational efficiency. The accuracy (Acc) of the XGBoost model was 0.8910, which improved to 0.8942 after hyperparameter tuning. External validation using datasets from Pingyang Hospital and Heji Hospital in Shanxi Province, China, yielded an AUC of 0.7897, demonstrating robust generalizability. By integrating SHAP (SHapley Additive exPlanations) for interpretability, our study identified bilirubin levels, basophil count, cholesterol levels, and age as key features for predicting the coexistence of type 2 diabetes mellitus (T2DM) and coronary heart disease (CHD). These findings are seamlessly consistent with the feature importance rankings determined by the XGBoost algorithm. The model demonstrates moderate predictive performance (AUC = 0.7879 in external validation) with practical interpretability, offering potential utility in improving diagnostic efficiency for T2DM-CHD comorbidity in resource-limited settings. However, its clinical implementation requires further validation in diverse populations.

基于机器学习的2型糖尿病合并冠心病预测模型及基于可解释SHAP的特征分析
背景:2型糖尿病和冠心病在中国人群中患病率升高,是导致死亡的主要原因。糖尿病和冠心病合并,由于其具有挑战性的诊断和预后差,造成了重大的疾病负担。近年来,机器学习已被频繁地应用于医学领域的诊断应用;然而,2型糖尿病合并冠心病的预测模型在预测过程中面临着预测性能较低和其他合并症的干扰等问题。方法:提高预测糖尿病与冠心病共存模型的预测准确性、敏感性、特异性、F1评分和AUC。利用XGBoost结合SHAP进行特征分析,建立了先进的预测模型。通过比较特征选择、超参数优化和计算效率分析,我们确定了模型性能的最佳条件。独立数据集的外部验证证实了该模型的稳健性和通用性,支持其在临床实践中的潜在实施。结果:本研究比较了random Forest、LightGBM和XGBoost三种模型,发现XGBoost在功效和计算效率方面都表现出更优越的性能。XGBoost模型的精度(Acc)为0.8910,经过超参数调优后提高到0.8942。使用中国山西省平阳医院和河集医院的数据集进行外部验证,得出AUC为0.7897,显示出稳健的推广能力。通过整合SHapley加性解释(SHapley Additive exPlanations)的可解释性,我们的研究确定了胆红素水平、嗜碱性粒细胞计数、胆固醇水平和年龄是预测2型糖尿病(T2DM)和冠心病(CHD)共存的关键特征。这些发现与XGBoost算法确定的功能重要性排名无缝一致。该模型表现出中等的预测性能(外部验证的AUC = 0.7879),具有实际的可解释性,为提高资源有限地区t2dm -冠心病合并症的诊断效率提供了潜在的实用价值。然而,其临床应用需要在不同人群中进一步验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Acta Diabetologica
Acta Diabetologica 医学-内分泌学与代谢
CiteScore
7.30
自引率
2.60%
发文量
180
审稿时长
2 months
期刊介绍: Acta Diabetologica is a journal that publishes reports of experimental and clinical research on diabetes mellitus and related metabolic diseases. Original contributions on biochemical, physiological, pathophysiological and clinical aspects of research on diabetes and metabolic diseases are welcome. Reports are published in the form of original articles, short communications and letters to the editor. Invited reviews and editorials are also published. A Methodology forum, which publishes contributions on methodological aspects of diabetes in vivo and in vitro, is also available. The Editor-in-chief will be pleased to consider articles describing new techniques (e.g., new transplantation methods, metabolic models), of innovative importance in the field of diabetes/metabolism. Finally, workshop reports are also welcome in Acta Diabetologica.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信