Predicting diabetic retinopathy based on routine laboratory tests by machine learning algorithms.

IF 2.8 3区医学 Q2 MEDICINE, RESEARCH & EXPERIMENTAL

European Journal of Medical Research Pub Date : 2025-03-18 DOI:10.1186/s40001-025-02442-5

Xiaohua Wan, Ruihuan Zhang, Yanan Wang, Wei Wei, Biao Song, Lin Zhang, Yanwei Hu

{"title":"Predicting diabetic retinopathy based on routine laboratory tests by machine learning algorithms.","authors":"Xiaohua Wan, Ruihuan Zhang, Yanan Wang, Wei Wei, Biao Song, Lin Zhang, Yanwei Hu","doi":"10.1186/s40001-025-02442-5","DOIUrl":null,"url":null,"abstract":"Objectives: This study aimed to identify risk factors for diabetic retinopathy (DR) and develop machine learning (ML)-based predictive models using routine laboratory data in patients with type 2 diabetes mellitus (T2DM).Methods: Clinical data from 4259 T2DM inpatients at Beijing Tongren Hospital were analyzed, divided into a model construction data set (N = 3936) and an external validation data set (N = 323). Using 39 optimal variables, a prediction model was constructed using the eXtreme Gradient Boosting (XGBoost) algorithm and compared with four other algorithms: support vector machine (SVM), gradient boosting decision tree (GBDT), neural network (NN), and logistic regression (LR). The Shapley Additive exPlanation (SHAP) method was employed to interpret the XGBoost model. External validation was performed to assess model performance.Results: DR was present in 47.69% (N = 1877) of T2DM patients in the model construction data set. Among the models tested, the XGBoost model performed best with an AUC of 0.831, accuracy of 0.757, sensitivity of 0.754, specificity of 0.759, and F1-score of 0.752. SHAP explained feature importance for XGBoost model and identified key risk factors for DR. External validation yielded an accuracy of 0.650 for the XGBoost model.Conclusions: The XGBoost-based prediction model effectively assesses DR risk in T2DM patients using routine laboratory data, aiding clinicians in identifying high-risk individuals and guiding personalized management strategies, especially in medically underserved areas.","PeriodicalId":11949,"journal":{"name":"European Journal of Medical Research","volume":"30 1","pages":"183"},"PeriodicalIF":2.8000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11921716/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Medical Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40001-025-02442-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: This study aimed to identify risk factors for diabetic retinopathy (DR) and develop machine learning (ML)-based predictive models using routine laboratory data in patients with type 2 diabetes mellitus (T2DM).

Methods: Clinical data from 4259 T2DM inpatients at Beijing Tongren Hospital were analyzed, divided into a model construction data set (N = 3936) and an external validation data set (N = 323). Using 39 optimal variables, a prediction model was constructed using the eXtreme Gradient Boosting (XGBoost) algorithm and compared with four other algorithms: support vector machine (SVM), gradient boosting decision tree (GBDT), neural network (NN), and logistic regression (LR). The Shapley Additive exPlanation (SHAP) method was employed to interpret the XGBoost model. External validation was performed to assess model performance.

Results: DR was present in 47.69% (N = 1877) of T2DM patients in the model construction data set. Among the models tested, the XGBoost model performed best with an AUC of 0.831, accuracy of 0.757, sensitivity of 0.754, specificity of 0.759, and F1-score of 0.752. SHAP explained feature importance for XGBoost model and identified key risk factors for DR. External validation yielded an accuracy of 0.650 for the XGBoost model.

Conclusions: The XGBoost-based prediction model effectively assesses DR risk in T2DM patients using routine laboratory data, aiding clinicians in identifying high-risk individuals and guiding personalized management strategies, especially in medically underserved areas.

查看原文本刊更多论文

基于机器学习算法的常规实验室测试预测糖尿病视网膜病变。

目的：本研究旨在识别糖尿病视网膜病变（DR）的危险因素，并利用2型糖尿病（T2DM）患者的常规实验室数据开发基于机器学习（ML）的预测模型。方法：分析北京同仁医院4259例T2DM住院患者的临床资料，分为模型构建数据集（N = 3936）和外部验证数据集（N = 323）。利用39个最优变量，利用极限梯度增强（XGBoost）算法构建预测模型，并与支持向量机（SVM）、梯度增强决策树（GBDT）、神经网络（NN）和逻辑回归（LR）等4种算法进行比较。采用Shapley加性解释（SHAP）方法对XGBoost模型进行解释。进行外部验证以评估模型的性能。结果：在模型构建数据集中，47.69% (N = 1877) T2DM患者出现DR。其中，XGBoost模型的AUC为0.831，准确率为0.757，灵敏度为0.754，特异性为0.759，f1评分为0.752。SHAP解释了XGBoost模型的特征重要性，并确定了dr的关键风险因素。外部验证得出XGBoost模型的准确性为0.650。结论：基于xgboost的预测模型使用常规实验室数据有效评估T2DM患者的DR风险，帮助临床医生识别高危人群并指导个性化管理策略，特别是在医疗服务不足的地区。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Medical Research 医学-医学：研究与实验

CiteScore

3.20

自引率

0.00%

发文量

247

审稿时长

>12 weeks

期刊介绍： European Journal of Medical Research publishes translational and clinical research of international interest across all medical disciplines, enabling clinicians and other researchers to learn about developments and innovations within these disciplines and across the boundaries between disciplines. The journal publishes high quality research and reviews and aims to ensure that the results of all well-conducted research are published, regardless of their outcome.