Ting-Wei Wang , Wei-Ting Luo , Yu-Kang Tu , Yu-Bai Chou , Yu-Te Wu
{"title":"Prospective validation of deep-learning algorithms for diabetic retinopathy screening: A systematic review and meta-analysis","authors":"Ting-Wei Wang , Wei-Ting Luo , Yu-Kang Tu , Yu-Bai Chou , Yu-Te Wu","doi":"10.1016/j.survophthal.2025.11.012","DOIUrl":null,"url":null,"abstract":"<div><div>Deep-learning (DL) algorithms are widely promoted for diabetic-retinopathy (DR) screening, yet their prospective diagnostic accuracy is not well defined. PubMed, EMBASE and ClinicalTrials.gov were searched to April, 2025, for prospective evaluations of DL systems using color-fundus images. Two reviewers screened records, extracted data, and applied QUADAS-2. Hierarchical bivariate random-effects models produced pooled sensitivity and specificity for referable and vision-threatening DR), analyzed separately at patient and eye level. Twenty-one prespecified moderators were explored with uni- and multi-variate meta-regression; publication bias was assessed with Deeks’ test Seventy-three studies from 23 countries (255,330 examinations) met the criteria. Pooled patient-level sensitivity was 0.94 (95 % CI 0.92–0.95) and specificity 0.90 (95 % CI 0.87–0.93); eye-level values were 0.93 (95 % CI 0.91–0.95) and 0.94 (95 % CI 0.92–0.96). DR subtype, retinal-field strategy, camera form factor, and prevalence independently explained heterogeneity (p < 0.05). Performance matched or exceeded pivotal FDA trials (IDx-DR, EyeArt). AI gradability was ≥ 95 % in 60 % of cohorts, including handheld and smartphone systems. DL-based DR screening achieves consistent, high accuracy across devices and care settings, enabling scalable deployment in primary care, pharmacies, and mobile clinics. Quality assurance and ongoing monitoring are essential to maximize population-level benefits.</div></div>","PeriodicalId":22102,"journal":{"name":"Survey of ophthalmology","volume":"71 3","pages":"Pages 827-846"},"PeriodicalIF":5.9000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Survey of ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0039625725002267","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/12/2 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Deep-learning (DL) algorithms are widely promoted for diabetic-retinopathy (DR) screening, yet their prospective diagnostic accuracy is not well defined. PubMed, EMBASE and ClinicalTrials.gov were searched to April, 2025, for prospective evaluations of DL systems using color-fundus images. Two reviewers screened records, extracted data, and applied QUADAS-2. Hierarchical bivariate random-effects models produced pooled sensitivity and specificity for referable and vision-threatening DR), analyzed separately at patient and eye level. Twenty-one prespecified moderators were explored with uni- and multi-variate meta-regression; publication bias was assessed with Deeks’ test Seventy-three studies from 23 countries (255,330 examinations) met the criteria. Pooled patient-level sensitivity was 0.94 (95 % CI 0.92–0.95) and specificity 0.90 (95 % CI 0.87–0.93); eye-level values were 0.93 (95 % CI 0.91–0.95) and 0.94 (95 % CI 0.92–0.96). DR subtype, retinal-field strategy, camera form factor, and prevalence independently explained heterogeneity (p < 0.05). Performance matched or exceeded pivotal FDA trials (IDx-DR, EyeArt). AI gradability was ≥ 95 % in 60 % of cohorts, including handheld and smartphone systems. DL-based DR screening achieves consistent, high accuracy across devices and care settings, enabling scalable deployment in primary care, pharmacies, and mobile clinics. Quality assurance and ongoing monitoring are essential to maximize population-level benefits.
深度学习(DL)算法被广泛推广用于糖尿病视网膜病变(DR)筛查,但其前瞻性诊断准确性尚不明确。检索PubMed、EMBASE和ClinicalTrials.gov,检索到2025年4月使用彩色眼底图像的深度学习系统的前瞻性评估。两名审稿人筛选记录、提取数据并应用QUADAS-2。分层双变量随机效应模型产生了可参考和视力威胁DR的敏感性和特异性,分别在患者和眼睛水平进行分析。用单变量和多元元回归对21个预先设定的调节因子进行了探讨;采用Deeks检验评估发表偏倚,来自23个国家的73项研究(255,330项检查)符合标准。合并患者水平敏感性为0.94 (95% CI 0.92-0.95),特异性为0.90 (95% CI 0.87-0.93);眼位值分别为0.93 (95% CI 0.91-0.95)和0.94 (95% CI 0.92-0.96)。DR亚型、视网膜场策略、相机形状因素和患病率独立解释了异质性(p < 0.05)。性能达到或超过关键性FDA试验(IDx-DR, EyeArt)。在60%的队列中,包括手持和智能手机系统,AI可分级性≥95%。基于dl的DR筛查实现了跨设备和护理设置的一致性、高准确性,支持在初级保健、药房和移动诊所进行可扩展部署。质量保证和持续监测对于最大限度地提高人口水平的效益至关重要。
期刊介绍:
Survey of Ophthalmology is a clinically oriented review journal designed to keep ophthalmologists up to date. Comprehensive major review articles, written by experts and stringently refereed, integrate the literature on subjects selected for their clinical importance. Survey also includes feature articles, section reviews, book reviews, and abstracts.