用于检测糖尿病患者周围神经病变和下肢动脉疾病的可解释机器学习模型:对关键的共同和独特风险因素的分析

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Ya Wu, Danmeng Dong, Lijie Zhu, Zihong Luo, Yang Liu, Xiaoyun Xie
{"title":"用于检测糖尿病患者周围神经病变和下肢动脉疾病的可解释机器学习模型:对关键的共同和独特风险因素的分析","authors":"Ya Wu, Danmeng Dong, Lijie Zhu, Zihong Luo, Yang Liu, Xiaoyun Xie","doi":"10.1186/s12911-024-02595-z","DOIUrl":null,"url":null,"abstract":"Diabetic peripheral neuropathy (DPN) and lower extremity arterial disease (LEAD) are significant contributors to diabetic foot ulcers (DFUs), which severely affect patients’ quality of life. This study aimed to develop machine learning (ML) predictive models for DPN and LEAD and to identify both shared and distinct risk factors. This retrospective study included 479 diabetic inpatients, of whom 215 were diagnosed with DPN and 69 with LEAD. Clinical data and laboratory results were collected for each patient. Feature selection was performed using three methods: mutual information (MI), random forest recursive feature elimination (RF-RFE), and the Boruta algorithm to identify the most important features. Predictive models were developed using logistic regression (LR), random forest (RF), and eXtreme Gradient Boosting (XGBoost), with particle swarm optimization (PSO) used to optimize their hyperparameters. The SHapley Additive exPlanation (SHAP) method was applied to determine the importance of risk factors in the top-performing models. For diagnosing DPN, the XGBoost model was most effective, achieving a recall of 83.7%, specificity of 86.8%, accuracy of 85.4%, and an F1 score of 83.7%. On the other hand, the RF model excelled in diagnosing LEAD, with a recall of 85.7%, specificity of 92.9%, accuracy of 91.9%, and an F1 score of 82.8%. SHAP analysis revealed top five critical risk factors shared by DPN and LEAD, including increased urinary albumin-to-creatinine ratio (UACR), glycosylated hemoglobin (HbA1c), serum creatinine (Scr), older age, and carotid stenosis. Additionally, distinct risk factors were pinpointed: decreased serum albumin and lower lymphocyte count were linked to DPN, while elevated neutrophil-to-lymphocyte ratio (NLR) and higher D-dimer levels were associated with LEAD. This study demonstrated the effectiveness of ML models in predicting DPN and LEAD in diabetic patients and identified significant risk factors. Focusing on shared risk factors may greatly reduce the prevalence of both conditions, thereby mitigating the risk of developing DFUs.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable machine learning models for detecting peripheral neuropathy and lower extremity arterial disease in diabetics: an analysis of critical shared and unique risk factors\",\"authors\":\"Ya Wu, Danmeng Dong, Lijie Zhu, Zihong Luo, Yang Liu, Xiaoyun Xie\",\"doi\":\"10.1186/s12911-024-02595-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetic peripheral neuropathy (DPN) and lower extremity arterial disease (LEAD) are significant contributors to diabetic foot ulcers (DFUs), which severely affect patients’ quality of life. This study aimed to develop machine learning (ML) predictive models for DPN and LEAD and to identify both shared and distinct risk factors. This retrospective study included 479 diabetic inpatients, of whom 215 were diagnosed with DPN and 69 with LEAD. Clinical data and laboratory results were collected for each patient. Feature selection was performed using three methods: mutual information (MI), random forest recursive feature elimination (RF-RFE), and the Boruta algorithm to identify the most important features. Predictive models were developed using logistic regression (LR), random forest (RF), and eXtreme Gradient Boosting (XGBoost), with particle swarm optimization (PSO) used to optimize their hyperparameters. The SHapley Additive exPlanation (SHAP) method was applied to determine the importance of risk factors in the top-performing models. For diagnosing DPN, the XGBoost model was most effective, achieving a recall of 83.7%, specificity of 86.8%, accuracy of 85.4%, and an F1 score of 83.7%. On the other hand, the RF model excelled in diagnosing LEAD, with a recall of 85.7%, specificity of 92.9%, accuracy of 91.9%, and an F1 score of 82.8%. SHAP analysis revealed top five critical risk factors shared by DPN and LEAD, including increased urinary albumin-to-creatinine ratio (UACR), glycosylated hemoglobin (HbA1c), serum creatinine (Scr), older age, and carotid stenosis. Additionally, distinct risk factors were pinpointed: decreased serum albumin and lower lymphocyte count were linked to DPN, while elevated neutrophil-to-lymphocyte ratio (NLR) and higher D-dimer levels were associated with LEAD. This study demonstrated the effectiveness of ML models in predicting DPN and LEAD in diabetic patients and identified significant risk factors. Focusing on shared risk factors may greatly reduce the prevalence of both conditions, thereby mitigating the risk of developing DFUs.\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-024-02595-z\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-024-02595-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

糖尿病周围神经病变(DPN)和下肢动脉疾病(LEAD)是导致糖尿病足溃疡(DFUs)的重要因素,严重影响患者的生活质量。本研究旨在开发针对 DPN 和 LEAD 的机器学习 (ML) 预测模型,并识别共同和不同的风险因素。这项回顾性研究纳入了 479 名糖尿病住院患者,其中 215 人被诊断为 DPN,69 人被诊断为 LEAD。研究人员收集了每位患者的临床数据和实验室结果。采用三种方法进行特征选择:互信息(MI)、随机森林递归特征消除(RF-RFE)和 Boruta 算法,以确定最重要的特征。使用逻辑回归(LR)、随机森林(RF)和极梯度提升(XGBoost)开发了预测模型,并使用粒子群优化(PSO)来优化其超参数。应用SHAPLE Additive exPlanation(SHAP)方法来确定风险因素在表现最佳的模型中的重要性。在诊断 DPN 方面,XGBoost 模型最为有效,其召回率为 83.7%,特异性为 86.8%,准确率为 85.4%,F1 得分为 83.7%。另一方面,RF 模型在诊断 LEAD 方面表现出色,召回率为 85.7%,特异性为 92.9%,准确率为 91.9%,F1 得分为 82.8%。SHAP分析显示了DPN和LEAD共有的五大关键风险因素,包括尿白蛋白与肌酐比值(UACR)升高、糖化血红蛋白(HbA1c)升高、血清肌酐(Scr)升高、年龄增大和颈动脉狭窄。此外,还发现了一些不同的风险因素:血清白蛋白降低和淋巴细胞计数减少与 DPN 有关,而中性粒细胞与淋巴细胞比率(NLR)升高和 D-二聚体水平升高与 LEAD 有关。这项研究证明了 ML 模型在预测糖尿病患者 DPN 和 LEAD 方面的有效性,并确定了重要的风险因素。关注共同的风险因素可能会大大降低这两种疾病的发病率,从而降低罹患 DFU 的风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interpretable machine learning models for detecting peripheral neuropathy and lower extremity arterial disease in diabetics: an analysis of critical shared and unique risk factors
Diabetic peripheral neuropathy (DPN) and lower extremity arterial disease (LEAD) are significant contributors to diabetic foot ulcers (DFUs), which severely affect patients’ quality of life. This study aimed to develop machine learning (ML) predictive models for DPN and LEAD and to identify both shared and distinct risk factors. This retrospective study included 479 diabetic inpatients, of whom 215 were diagnosed with DPN and 69 with LEAD. Clinical data and laboratory results were collected for each patient. Feature selection was performed using three methods: mutual information (MI), random forest recursive feature elimination (RF-RFE), and the Boruta algorithm to identify the most important features. Predictive models were developed using logistic regression (LR), random forest (RF), and eXtreme Gradient Boosting (XGBoost), with particle swarm optimization (PSO) used to optimize their hyperparameters. The SHapley Additive exPlanation (SHAP) method was applied to determine the importance of risk factors in the top-performing models. For diagnosing DPN, the XGBoost model was most effective, achieving a recall of 83.7%, specificity of 86.8%, accuracy of 85.4%, and an F1 score of 83.7%. On the other hand, the RF model excelled in diagnosing LEAD, with a recall of 85.7%, specificity of 92.9%, accuracy of 91.9%, and an F1 score of 82.8%. SHAP analysis revealed top five critical risk factors shared by DPN and LEAD, including increased urinary albumin-to-creatinine ratio (UACR), glycosylated hemoglobin (HbA1c), serum creatinine (Scr), older age, and carotid stenosis. Additionally, distinct risk factors were pinpointed: decreased serum albumin and lower lymphocyte count were linked to DPN, while elevated neutrophil-to-lymphocyte ratio (NLR) and higher D-dimer levels were associated with LEAD. This study demonstrated the effectiveness of ML models in predicting DPN and LEAD in diabetic patients and identified significant risk factors. Focusing on shared risk factors may greatly reduce the prevalence of both conditions, thereby mitigating the risk of developing DFUs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信