Development and Validation of a Machine Learning Algorithm for Predicting Diabetes Retinopathy in Patients With Type 2 Diabetes: Algorithm Development Study.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-02-07 DOI:10.2196/58107

Sunyoung Kim, Jaeyu Park, Yejun Son, Hojae Lee, Selin Woo, Myeongcheol Lee, Hayeon Lee, Hyunji Sang, Dong Keon Yon, Sang Youl Rhee

{"title":"Development and Validation of a Machine Learning Algorithm for Predicting Diabetes Retinopathy in Patients With Type 2 Diabetes: Algorithm Development Study.","authors":"Sunyoung Kim, Jaeyu Park, Yejun Son, Hojae Lee, Selin Woo, Myeongcheol Lee, Hayeon Lee, Hyunji Sang, Dong Keon Yon, Sang Youl Rhee","doi":"10.2196/58107","DOIUrl":null,"url":null,"abstract":"Background: Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. Machine learning (ML) systems can enhance DR in community-based screening. However, predictive power models for usability and performance are still being determined.Objective: This study used data from 3 university hospitals in South Korea to conduct a simple and accurate assessment of ML-based risk prediction for the development of DR that can be universally applied to adults with type 2 diabetes mellitus (T2DM).Methods: DR was predicted using data from 2 independent electronic medical records: a discovery cohort (one hospital, n=14,694) and a validation cohort (2 hospitals, n=1856). The primary outcome was the presence of DR at 3 years. Different ML-based models were selected through hyperparameter tuning in the discovery cohort, and the area under the receiver operating characteristic (ROC) curve was analyzed in both cohorts.Results: Among 14,694 patients screened for inclusion, 348 (2.37%) were diagnosed with DR. For DR, the extreme gradient boosting (XGBoost) system had an accuracy of 75.13% (95% CI 74.10-76.17), a sensitivity of 71.00% (95% CI 66.83-75.17), and a specificity of 75.23% (95% CI 74.16-76.31) in the original dataset. Among the validation datasets, XGBoost had an accuracy of 65.14%, a sensitivity of 64.96%, and a specificity of 65.15%. The most common feature in the XGBoost model is dyslipidemia, followed by cancer, hypertension, chronic kidney disease, neuropathy, and cardiovascular disease.Conclusions: This approach shows the potential to enhance patient outcomes by enabling timely interventions in patients with T2DM, improving our understanding of contributing factors, and reducing DR-related complications. The proposed prediction model is expected to be both competitive and cost-effective, particularly for primary care settings in South Korea.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e58107"},"PeriodicalIF":3.1000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11830482/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/58107","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Diabetic retinopathy (DR) is the leading cause of preventable blindness worldwide. Machine learning (ML) systems can enhance DR in community-based screening. However, predictive power models for usability and performance are still being determined.

Objective: This study used data from 3 university hospitals in South Korea to conduct a simple and accurate assessment of ML-based risk prediction for the development of DR that can be universally applied to adults with type 2 diabetes mellitus (T2DM).

Methods: DR was predicted using data from 2 independent electronic medical records: a discovery cohort (one hospital, n=14,694) and a validation cohort (2 hospitals, n=1856). The primary outcome was the presence of DR at 3 years. Different ML-based models were selected through hyperparameter tuning in the discovery cohort, and the area under the receiver operating characteristic (ROC) curve was analyzed in both cohorts.

Results: Among 14,694 patients screened for inclusion, 348 (2.37%) were diagnosed with DR. For DR, the extreme gradient boosting (XGBoost) system had an accuracy of 75.13% (95% CI 74.10-76.17), a sensitivity of 71.00% (95% CI 66.83-75.17), and a specificity of 75.23% (95% CI 74.16-76.31) in the original dataset. Among the validation datasets, XGBoost had an accuracy of 65.14%, a sensitivity of 64.96%, and a specificity of 65.15%. The most common feature in the XGBoost model is dyslipidemia, followed by cancer, hypertension, chronic kidney disease, neuropathy, and cardiovascular disease.

Conclusions: This approach shows the potential to enhance patient outcomes by enabling timely interventions in patients with T2DM, improving our understanding of contributing factors, and reducing DR-related complications. The proposed prediction model is expected to be both competitive and cost-effective, particularly for primary care settings in South Korea.

查看原文本刊更多论文

预测2型糖尿病视网膜病变的机器学习算法的开发和验证：算法开发研究。

背景：糖尿病视网膜病变（DR）是世界范围内可预防性失明的主要原因。机器学习（ML）系统可以增强社区筛查中的DR。然而，可用性和性能的预测能力模型仍有待确定。目的：本研究利用韩国3所大学医院的数据，对基于ml的DR发展风险预测进行简单准确的评估，可普遍应用于成人2型糖尿病（T2DM）患者。方法：使用来自2个独立电子病历的数据预测DR：发现队列（1家医院，n= 14694）和验证队列（2家医院，n=1856）。主要终点是3年时DR的存在。在发现队列中通过超参数调整选择不同的基于ml的模型，并分析两个队列的受试者工作特征（ROC）曲线下的面积。结果：在筛选纳入的14,694例患者中，348例（2.37%）被诊断为DR。对于DR，极端梯度增强（XGBoost）系统在原始数据集中的准确率为75.13% (95% CI 74.10-76.17)，灵敏度为71.00% (95% CI 66.83-75.17)，特异性为75.23% （95% CI 74.16-76.31）。在验证数据集中，XGBoost的准确率为65.14%，灵敏度为64.96%，特异性为65.15%。XGBoost模型中最常见的特征是血脂异常，其次是癌症、高血压、慢性肾病、神经病变和心血管疾病。结论：该方法通过对T2DM患者进行及时干预，提高我们对影响因素的理解，并减少dr相关并发症，显示出提高患者预后的潜力。所提出的预测模型预计既具有竞争力又具有成本效益，特别是对韩国的初级保健机构而言。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.