Expert-Level Detection of Referable Glaucoma from Fundus Photographs in a Safety Net Population: The AI and Teleophthalmology in Los Angeles Initiative
Van Nguyen, Sreenidhi Iyengar, Haroon Rasheed, Galo Apolo, Zhiwei Li, Aniket Kumar, Hong Nguyen, Austin Bohner, Rahul Dhodapkar, Jiun Do, Andrew Duong, Jeffrey Gluckstein, Kendra Hong, Alanna James, Junhui Lee, Kent Nguyen, Brandon Wong, Jose-Luis Ambite, Carl Kesselman, Lauren Daskivich, Michael Pazzani, Benjamin Xu
{"title":"Expert-Level Detection of Referable Glaucoma from Fundus Photographs in a Safety Net Population: The AI and Teleophthalmology in Los Angeles Initiative","authors":"Van Nguyen, Sreenidhi Iyengar, Haroon Rasheed, Galo Apolo, Zhiwei Li, Aniket Kumar, Hong Nguyen, Austin Bohner, Rahul Dhodapkar, Jiun Do, Andrew Duong, Jeffrey Gluckstein, Kendra Hong, Alanna James, Junhui Lee, Kent Nguyen, Brandon Wong, Jose-Luis Ambite, Carl Kesselman, Lauren Daskivich, Michael Pazzani, Benjamin Xu","doi":"10.1101/2024.08.25.24312563","DOIUrl":null,"url":null,"abstract":"Purpose: To develop and test a deep learning (DL) algorithm for detecting referable glaucoma in the Los Angeles County (LAC) Department of Health Services (DHS) teleretinal screening program. Methods: Fundus photographs and patient-level labels of referable glaucoma (defined as cup-to-disc ratio [CDR] <= 0.6) provided by 21 trained optometrist graders were obtained from the LAC DHS teleretinal screening program. A DL algorithm based on the VGG-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with one to 15 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.\nResults: 12,098 images from 5,616 patients (2,086 referable glaucoma, 3,530 non-glaucoma) were used to train the DL algorithm. In this dataset, mean age was 56.8 +/- 10.5 years with 54.8% females and 68.2% Latinos, 8.9% Blacks, 2.7% Caucasians, and 6.0% Asians. 1,000 images from 500 patients (250 referable glaucoma, 250 non-glaucoma) with similar demographics (p <= 0.57) were used to test the DL algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUC = 0.92) or expert panel (AUC = 0.93) reference labels. Clinician grader sensitivity (range: 0.33-0.99) and specificity (range: 0.68-0.98) ranged widely and did not correlate with years of experience (p <= 0.49). Algorithm performance (AUC = 0.93) also matched or exceeded the sensitivity (range: 0.78-1.00) and specificity (range: 0.32-0.87) of 6 LAC DHS optometrists in the subsets of the test dataset they graded based on expert panel reference labels.\nConclusions: A DL algorithm for detecting referable glaucoma developed using patient-level data provided by trained LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate eye care resources and provide more reproducible and timely glaucoma care.","PeriodicalId":501390,"journal":{"name":"medRxiv - Ophthalmology","volume":"75 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.25.24312563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To develop and test a deep learning (DL) algorithm for detecting referable glaucoma in the Los Angeles County (LAC) Department of Health Services (DHS) teleretinal screening program. Methods: Fundus photographs and patient-level labels of referable glaucoma (defined as cup-to-disc ratio [CDR] <= 0.6) provided by 21 trained optometrist graders were obtained from the LAC DHS teleretinal screening program. A DL algorithm based on the VGG-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with one to 15 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists.
Results: 12,098 images from 5,616 patients (2,086 referable glaucoma, 3,530 non-glaucoma) were used to train the DL algorithm. In this dataset, mean age was 56.8 +/- 10.5 years with 54.8% females and 68.2% Latinos, 8.9% Blacks, 2.7% Caucasians, and 6.0% Asians. 1,000 images from 500 patients (250 referable glaucoma, 250 non-glaucoma) with similar demographics (p <= 0.57) were used to test the DL algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUC = 0.92) or expert panel (AUC = 0.93) reference labels. Clinician grader sensitivity (range: 0.33-0.99) and specificity (range: 0.68-0.98) ranged widely and did not correlate with years of experience (p <= 0.49). Algorithm performance (AUC = 0.93) also matched or exceeded the sensitivity (range: 0.78-1.00) and specificity (range: 0.32-0.87) of 6 LAC DHS optometrists in the subsets of the test dataset they graded based on expert panel reference labels.
Conclusions: A DL algorithm for detecting referable glaucoma developed using patient-level data provided by trained LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate eye care resources and provide more reproducible and timely glaucoma care.