比较深度学习和临床医生从安全网人群眼底照片中检测可参考青光眼的表现

IF 3.2 Q1 OPHTHALMOLOGY

Ophthalmology science Pub Date : 2025-02-25 DOI:10.1016/j.xops.2025.100751

Van Nguyen MD , Sreenidhi Iyengar , Haroon Rasheed MD , Galo Apolo , Zhiwei Li , Aniket Kumar , Hong Nguyen , Austin Bohner MD , Kyle Bolo MD , Rahul Dhodapkar MD , Jiun Do MD, PhD , Andrew T. Duong MD , Jeffrey Gluckstein MD , Kendra Hong MD , Lucas L. Humayun , Alanna James MD , Junhui Lee MD , Kent Nguyen OD , Brandon J. Wong MD , Jose-Luis Ambite PhD , Benjamin Y. Xu MD, PhD

{"title":"比较深度学习和临床医生从安全网人群眼底照片中检测可参考青光眼的表现","authors":"Van Nguyen MD , Sreenidhi Iyengar , Haroon Rasheed MD , Galo Apolo , Zhiwei Li , Aniket Kumar , Hong Nguyen , Austin Bohner MD , Kyle Bolo MD , Rahul Dhodapkar MD , Jiun Do MD, PhD , Andrew T. Duong MD , Jeffrey Gluckstein MD , Kendra Hong MD , Lucas L. Humayun , Alanna James MD , Junhui Lee MD , Kent Nguyen OD , Brandon J. Wong MD , Jose-Luis Ambite PhD , Benjamin Y. Xu MD, PhD","doi":"10.1016/j.xops.2025.100751","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Develop and test a deep learning (DL) algorithm for detecting referable glaucoma.</div></div><div><h3>Design</h3><div>Retrospective cohort study.</div></div><div><h3>Participants</h3><div>A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included.</div></div><div><h3>Methods</h3><div>Fundus photographs and patient-level labels of referable glaucoma (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists. A DL algorithm based on the Visual Geometry Group-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUROC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with 0 to 10 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.</div></div><div><h3>Main Outcome Measures</h3><div>Area under the receiver operating curve, sensitivity, and specificity.</div></div><div><h3>Results</h3><div>The DL algorithm was trained using 12 998 images from 5616 patients (2086 referable glaucoma, 3530 nonglaucoma). In this data set, the mean age was 56.8 ± 10.5 years with 54.8% women, 68.2% Latinos, 8.9% Blacks, 6.0% Asians, and 2.7% Whites. One thousand images from 500 patients (250 referable glaucoma, 250 nonglaucoma) with similar demographics (<em>P</em> ≥ 0.57) were used to test the algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels. Clinician grader sensitivity (range, 0.33–0.99) and specificity (range, 0.68–0.98) ranged widely and did not correlate with years of experience (<em>P</em>≥ 0.49). Algorithm performance (AUROC = 0.93) also matched or exceeded the sensitivity (range, 0.78–1.00) and specificity (range, 0.32–0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.</div></div><div><h3>Conclusions</h3><div>A DL algorithm for detecting referable glaucoma trained using patient-level data provided by certified LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate resources and provide more reproducible and timely glaucoma care.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 4","pages":"Article 100751"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population\",\"authors\":\"Van Nguyen MD , Sreenidhi Iyengar , Haroon Rasheed MD , Galo Apolo , Zhiwei Li , Aniket Kumar , Hong Nguyen , Austin Bohner MD , Kyle Bolo MD , Rahul Dhodapkar MD , Jiun Do MD, PhD , Andrew T. Duong MD , Jeffrey Gluckstein MD , Kendra Hong MD , Lucas L. Humayun , Alanna James MD , Junhui Lee MD , Kent Nguyen OD , Brandon J. Wong MD , Jose-Luis Ambite PhD , Benjamin Y. Xu MD, PhD\",\"doi\":\"10.1016/j.xops.2025.100751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>Develop and test a deep learning (DL) algorithm for detecting referable glaucoma.</div></div><div><h3>Design</h3><div>Retrospective cohort study.</div></div><div><h3>Participants</h3><div>A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included.</div></div><div><h3>Methods</h3><div>Fundus photographs and patient-level labels of referable glaucoma (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists. A DL algorithm based on the Visual Geometry Group-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUROC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with 0 to 10 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.</div></div><div><h3>Main Outcome Measures</h3><div>Area under the receiver operating curve, sensitivity, and specificity.</div></div><div><h3>Results</h3><div>The DL algorithm was trained using 12 998 images from 5616 patients (2086 referable glaucoma, 3530 nonglaucoma). In this data set, the mean age was 56.8 ± 10.5 years with 54.8% women, 68.2% Latinos, 8.9% Blacks, 6.0% Asians, and 2.7% Whites. One thousand images from 500 patients (250 referable glaucoma, 250 nonglaucoma) with similar demographics (<em>P</em> ≥ 0.57) were used to test the algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels. Clinician grader sensitivity (range, 0.33–0.99) and specificity (range, 0.68–0.98) ranged widely and did not correlate with years of experience (<em>P</em>≥ 0.49). Algorithm performance (AUROC = 0.93) also matched or exceeded the sensitivity (range, 0.78–1.00) and specificity (range, 0.32–0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.</div></div><div><h3>Conclusions</h3><div>A DL algorithm for detecting referable glaucoma trained using patient-level data provided by certified LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate resources and provide more reproducible and timely glaucoma care.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>\",\"PeriodicalId\":74363,\"journal\":{\"name\":\"Ophthalmology science\",\"volume\":\"5 4\",\"pages\":\"Article 100751\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ophthalmology science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666914525000491\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525000491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的开发和测试一种深度学习（DL）算法，用于检测可参考青光眼。设计回顾性队列研究。来自洛杉矶县卫生服务部（DHS）的6116名患者被纳入研究。方法由21名验光师提供可参考青光眼（杯盘比≥0.6）的眼底照片和患者水平标签。基于视觉几何组-19架构的深度学习算法使用患者级标签推广到双眼图像。计算受试者工作曲线下面积（AUROC）、敏感性和特异性，使用独立测试集评估算法的性能，该测试集也由13名具有0至10年经验的临床医生评分。使用LAC DHS验光师或由3名青光眼专家组成的专家组提供的参考标签测试算法性能。主要观察指标：受试者工作曲线下的面积、敏感性和特异性。结果使用5616例患者（可参考青光眼2086例，非青光眼3530例）的12 998张图像对DL算法进行了训练。在该数据集中，平均年龄为56.8±10.5岁，女性占54.8%，拉丁裔占68.2%，黑人占8.9%，亚洲人占6.0%，白人占2.7%。来自500例患者的1000张图像（250例可参考青光眼，250例非青光眼）具有相似的人口统计学特征（P≥0.57），用于测试该算法。在基于LAC DHS验光师（AUROC = 0.92）或专家小组（AUROC = 0.93）参考标签检测患者级别可参考青光眼方面，算法性能匹配或超过了所有独立临床医生评分者。临床医生评分的敏感性（范围，0.33-0.99）和特异性（范围，0.68-0.98）差异很大，与经验年数无关（P≥0.49）。算法性能（AUROC = 0.93）也匹配或超过6名认证LAC DHS验光师在他们分级的测试数据集子集中的灵敏度（范围，0.78-1.00）和特异性（范围，0.32-0.87）。结论：使用经认证的LAC DHS验光师提供的患者水平数据训练的DL算法检测可转诊青光眼，接近或超过眼科医生和验光师的表现，后者表现出与经验水平无关的可变灵敏度和特异性。在筛选工作流程中实施该算法可以帮助重新分配资源，并提供更可重复和及时的青光眼护理。财务披露专有或商业披露可在本文末尾的脚注和披露中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population

Purpose

Develop and test a deep learning (DL) algorithm for detecting referable glaucoma.

Design

Retrospective cohort study.

Participants

A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included.

Methods

Fundus photographs and patient-level labels of referable glaucoma (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists. A DL algorithm based on the Visual Geometry Group-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUROC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with 0 to 10 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.

Main Outcome Measures

Area under the receiver operating curve, sensitivity, and specificity.

Results

The DL algorithm was trained using 12 998 images from 5616 patients (2086 referable glaucoma, 3530 nonglaucoma). In this data set, the mean age was 56.8 ± 10.5 years with 54.8% women, 68.2% Latinos, 8.9% Blacks, 6.0% Asians, and 2.7% Whites. One thousand images from 500 patients (250 referable glaucoma, 250 nonglaucoma) with similar demographics (P ≥ 0.57) were used to test the algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels. Clinician grader sensitivity (range, 0.33–0.99) and specificity (range, 0.68–0.98) ranged widely and did not correlate with years of experience (P≥ 0.49). Algorithm performance (AUROC = 0.93) also matched or exceeded the sensitivity (range, 0.78–1.00) and specificity (range, 0.32–0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded.

Conclusions

A DL algorithm for detecting referable glaucoma trained using patient-level data provided by certified LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate resources and provide more reproducible and timely glaucoma care.