用于糖尿病肾病预测和风险评估的可解释机器学习模型。

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-09-09 DOI:10.2196/64979

Yili Wen, Zhiqiang Wan, Huiling Ren, Xu Wang, Weijie Wang

{"title":"用于糖尿病肾病预测和风险评估的可解释机器学习模型。","authors":"Yili Wen, Zhiqiang Wan, Huiling Ren, Xu Wang, Weijie Wang","doi":"10.2196/64979","DOIUrl":null,"url":null,"abstract":"Unstructured: Introduction: Diabetic Nephropathy (DN), a severe complication of diabetes, is characterized by proteinuria, hypertension, and progressive renal function decline, potentially leading to end-stage renal disease. The International Diabetes Federation projects that by 2045, 783 million people will have diabetes, with 30%-40% of them developing DN. Current diagnostic approaches lack sufficient sensitivity and specificity for early detection and diagnosis, underscoring the need for an accurate, interpretable predictive model to enable timely intervention, reduce cardiovascular risks, and optimize healthcare costs. Methods: Our retrospective cohort study investigated 1,000 type-2 diabetes patients using data from electronic medical records collected between 2015 and 2020. The study design incorporated a sample of 444 patients with diabetic nephropathy and 556 without, focusing on demographics, clinical metrics such as blood pressure and glucose levels, and renal function markers. Data collection relied on electronic records, with missing values handled via multiple imputation and dataset balance achieved using SMOTE. In this study, advanced machine learning algorithms, namly XGBoost, CatBoost, and LightGBM, were utilized due to their robustness in handling complex datasets. Key metrics, including accuracy, precision, recall, F1 score, specificity, and area under the curve (AUC), were employed to provide a comprehensive assessment of model performance. Additionally, Explainable Machine Learning (XML) techniques, such as LIME and SHAP, were applied to enhance the transparency and interpretability of the models, offering valuable insights into their decision-making processes. Results: XGBoost and LightGBM demonstrated superior performance, with XGBoost achieving the highest accuracy of 86.87%, a precision of 88.90%, a recall of 84.40%, an f1 score of 86.44%, and a specificity of 89.12%. LIME and SHAP analyses provided insights into the contribution of individual features to elucidate the decision-making processes of these models, identifying serum creatinine, albumin, and lipoproteins as significant predictors. Conclusion: The developed machine learning model not only provides a robust predictive tool for early diagnosis and risk assessment of DN but also ensures transparency and interpretability, crucial for clinical integration. By enabling early intervention and personalized treatment strategies, this model has the potential to improve patient outcomes and optimize healthcare resource utilization.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable Machine Learning Model for Predicting and Risk Assessment of Diabetic Nephropathy.\",\"authors\":\"Yili Wen, Zhiqiang Wan, Huiling Ren, Xu Wang, Weijie Wang\",\"doi\":\"10.2196/64979\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unstructured: Introduction: Diabetic Nephropathy (DN), a severe complication of diabetes, is characterized by proteinuria, hypertension, and progressive renal function decline, potentially leading to end-stage renal disease. The International Diabetes Federation projects that by 2045, 783 million people will have diabetes, with 30%-40% of them developing DN. Current diagnostic approaches lack sufficient sensitivity and specificity for early detection and diagnosis, underscoring the need for an accurate, interpretable predictive model to enable timely intervention, reduce cardiovascular risks, and optimize healthcare costs. Methods: Our retrospective cohort study investigated 1,000 type-2 diabetes patients using data from electronic medical records collected between 2015 and 2020. The study design incorporated a sample of 444 patients with diabetic nephropathy and 556 without, focusing on demographics, clinical metrics such as blood pressure and glucose levels, and renal function markers. Data collection relied on electronic records, with missing values handled via multiple imputation and dataset balance achieved using SMOTE. In this study, advanced machine learning algorithms, namly XGBoost, CatBoost, and LightGBM, were utilized due to their robustness in handling complex datasets. Key metrics, including accuracy, precision, recall, F1 score, specificity, and area under the curve (AUC), were employed to provide a comprehensive assessment of model performance. Additionally, Explainable Machine Learning (XML) techniques, such as LIME and SHAP, were applied to enhance the transparency and interpretability of the models, offering valuable insights into their decision-making processes. Results: XGBoost and LightGBM demonstrated superior performance, with XGBoost achieving the highest accuracy of 86.87%, a precision of 88.90%, a recall of 84.40%, an f1 score of 86.44%, and a specificity of 89.12%. LIME and SHAP analyses provided insights into the contribution of individual features to elucidate the decision-making processes of these models, identifying serum creatinine, albumin, and lipoproteins as significant predictors. Conclusion: The developed machine learning model not only provides a robust predictive tool for early diagnosis and risk assessment of DN but also ensures transparency and interpretability, crucial for clinical integration. By enabling early intervention and personalized treatment strategies, this model has the potential to improve patient outcomes and optimize healthcare resource utilization.\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/64979\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/64979","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

简介：糖尿病肾病（Diabetic Nephropathy， DN）是糖尿病的一种严重并发症，以蛋白尿、高血压和进行性肾功能下降为特征，可能导致终末期肾病。国际糖尿病联合会预计，到2045年，将有7.83亿人患有糖尿病，其中30%-40%将发展为糖尿病肾病。目前的诊断方法在早期检测和诊断方面缺乏足够的敏感性和特异性，因此需要一个准确的、可解释的预测模型，以便及时干预，降低心血管风险，并优化医疗成本。方法：我们的回顾性队列研究使用2015年至2020年收集的电子病历数据调查了1000名2型糖尿病患者。研究设计纳入了444例糖尿病肾病患者和556例非糖尿病肾病患者的样本，重点关注人口统计学、临床指标（如血压和血糖水平）和肾功能指标。数据收集依赖于电子记录，通过多次输入处理缺失值，并使用SMOTE实现数据集平衡。在本研究中，使用了先进的机器学习算法，即XGBoost， CatBoost和LightGBM，因为它们在处理复杂数据集方面具有鲁棒性。关键指标，包括准确性、精密度、召回率、F1评分、特异性和曲线下面积（AUC），被用来提供模型性能的综合评估。此外，可解释的机器学习（XML）技术，如LIME和SHAP，被用于提高模型的透明度和可解释性，为他们的决策过程提供有价值的见解。结果：XGBoost和LightGBM表现优异，XGBoost的准确率最高，为86.87%，精密度为88.90%，召回率为84.40%，f1评分为86.44%，特异性为89.12%。LIME和SHAP分析提供了对个体特征的贡献的见解，以阐明这些模型的决策过程，确定血清肌酐，白蛋白和脂蛋白是重要的预测因子。结论：所开发的机器学习模型不仅为DN的早期诊断和风险评估提供了强大的预测工具，而且确保了透明度和可解释性，这对临床整合至关重要。通过实现早期干预和个性化治疗策略，该模型具有改善患者预后和优化医疗资源利用的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interpretable Machine Learning Model for Predicting and Risk Assessment of Diabetic Nephropathy.

Unstructured: Introduction: Diabetic Nephropathy (DN), a severe complication of diabetes, is characterized by proteinuria, hypertension, and progressive renal function decline, potentially leading to end-stage renal disease. The International Diabetes Federation projects that by 2045, 783 million people will have diabetes, with 30%-40% of them developing DN. Current diagnostic approaches lack sufficient sensitivity and specificity for early detection and diagnosis, underscoring the need for an accurate, interpretable predictive model to enable timely intervention, reduce cardiovascular risks, and optimize healthcare costs. Methods: Our retrospective cohort study investigated 1,000 type-2 diabetes patients using data from electronic medical records collected between 2015 and 2020. The study design incorporated a sample of 444 patients with diabetic nephropathy and 556 without, focusing on demographics, clinical metrics such as blood pressure and glucose levels, and renal function markers. Data collection relied on electronic records, with missing values handled via multiple imputation and dataset balance achieved using SMOTE. In this study, advanced machine learning algorithms, namly XGBoost, CatBoost, and LightGBM, were utilized due to their robustness in handling complex datasets. Key metrics, including accuracy, precision, recall, F1 score, specificity, and area under the curve (AUC), were employed to provide a comprehensive assessment of model performance. Additionally, Explainable Machine Learning (XML) techniques, such as LIME and SHAP, were applied to enhance the transparency and interpretability of the models, offering valuable insights into their decision-making processes. Results: XGBoost and LightGBM demonstrated superior performance, with XGBoost achieving the highest accuracy of 86.87%, a precision of 88.90%, a recall of 84.40%, an f1 score of 86.44%, and a specificity of 89.12%. LIME and SHAP analyses provided insights into the contribution of individual features to elucidate the decision-making processes of these models, identifying serum creatinine, albumin, and lipoproteins as significant predictors. Conclusion: The developed machine learning model not only provides a robust predictive tool for early diagnosis and risk assessment of DN but also ensures transparency and interpretability, crucial for clinical integration. By enabling early intervention and personalized treatment strategies, this model has the potential to improve patient outcomes and optimize healthcare resource utilization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.