Joshua D. Stein MD, MS , Hong Su An PhD , Chris A. Andrews PhD , Suzann Pershing MD , Tushar Mungle PhD , Amanda K. Bicket MD, MSE , Julie M. Rosenthal MD, MS , Amy D. Zhang MD , Wen-Shin Lee MD , Cassie Ludwig MD, MS , Bethlehem Mekonnen MD , Tina Hernandez-Boussard PhD
{"title":"Enhanced Phenotype Identification of Common Ocular Diseases in Real-World Datasets","authors":"Joshua D. Stein MD, MS , Hong Su An PhD , Chris A. Andrews PhD , Suzann Pershing MD , Tushar Mungle PhD , Amanda K. Bicket MD, MSE , Julie M. Rosenthal MD, MS , Amy D. Zhang MD , Wen-Shin Lee MD , Cassie Ludwig MD, MS , Bethlehem Mekonnen MD , Tina Hernandez-Boussard PhD","doi":"10.1016/j.xops.2025.100717","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>For studies using real-world data, accurately identifying patients with phenotypes of interest is challenging. To identify cohorts of interest, most studies exclusively use the International Classification of Diseases (ICD) billing codes, which can be limiting. We developed a method to accurately identify the presence or absence of 3 common ocular diseases (diabetic retinopathy [DR], age-related macular degeneration [AMD], and glaucoma) using electronic health record (EHR) data.</div></div><div><h3>Design</h3><div>Database study.</div></div><div><h3>Participants</h3><div>Three thousand nine hundred fourteen eyes from 1957 patients at 2 Sight OUtcomes Research CollaborativE (SOURCE) Ophthalmology Data Repository sites.</div></div><div><h3>Methods</h3><div>We developed enhanced phenotype identification (EPI) algorithms that search EHR fields, including eye examination findings, orders, charges, medication prescriptions, and surgery data for evidence that a patient has glaucoma, DR, or AMD. We trained our EPI models using gold standard assessments of the EHR by ophthalmologists for the presence/absence of these conditions, compared the performance of our EPI models to models developed using ICD codes alone, and validated the performance of model using data from another SOURCE site.</div></div><div><h3>Main Outcome Measures</h3><div>Area under the receiver operating curve (AUC), area under the precision–recall curve (AUPRC), and model calibration.</div></div><div><h3>Results</h3><div>The AUCs of our EPI models were better than ICD-only models for glaucoma (0.97 vs. 0.90), DR (0.997 vs. 0.98), and AMD (0.99 vs. 0.95). The AUPRCs of our EPI models were also much better than ICD-only models for glaucoma (0.79 vs. 0.32), DR (0.96 vs. 0.84), and AMD (0.74 vs. 0.55). When testing on patients from a second SOURCE site, the AUC and AUPRC for glaucoma (0.93, 0.74), DR (0.98, 0.77), and AMD (0.96, 0.64) were slightly worse than the primary site but still quite high. However, for all 3 conditions, model calibration was worse at the second site.</div></div><div><h3>Conclusions</h3><div>Leveraging machine learning, we developed EPI models to accurately identify most patients with glaucoma, DR, and AMD in real-world datasets. The EPI models significantly outperform ICD-only models in identifying patients confirmed to have these conditions. These findings underscore the potential of using comprehensive EHR data combined with advanced machine learning techniques to improve the accuracy of patient phenotype identification, leading to better patient management and clinical outcomes.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 4","pages":"Article 100717"},"PeriodicalIF":3.2000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525000156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
For studies using real-world data, accurately identifying patients with phenotypes of interest is challenging. To identify cohorts of interest, most studies exclusively use the International Classification of Diseases (ICD) billing codes, which can be limiting. We developed a method to accurately identify the presence or absence of 3 common ocular diseases (diabetic retinopathy [DR], age-related macular degeneration [AMD], and glaucoma) using electronic health record (EHR) data.
Design
Database study.
Participants
Three thousand nine hundred fourteen eyes from 1957 patients at 2 Sight OUtcomes Research CollaborativE (SOURCE) Ophthalmology Data Repository sites.
Methods
We developed enhanced phenotype identification (EPI) algorithms that search EHR fields, including eye examination findings, orders, charges, medication prescriptions, and surgery data for evidence that a patient has glaucoma, DR, or AMD. We trained our EPI models using gold standard assessments of the EHR by ophthalmologists for the presence/absence of these conditions, compared the performance of our EPI models to models developed using ICD codes alone, and validated the performance of model using data from another SOURCE site.
Main Outcome Measures
Area under the receiver operating curve (AUC), area under the precision–recall curve (AUPRC), and model calibration.
Results
The AUCs of our EPI models were better than ICD-only models for glaucoma (0.97 vs. 0.90), DR (0.997 vs. 0.98), and AMD (0.99 vs. 0.95). The AUPRCs of our EPI models were also much better than ICD-only models for glaucoma (0.79 vs. 0.32), DR (0.96 vs. 0.84), and AMD (0.74 vs. 0.55). When testing on patients from a second SOURCE site, the AUC and AUPRC for glaucoma (0.93, 0.74), DR (0.98, 0.77), and AMD (0.96, 0.64) were slightly worse than the primary site but still quite high. However, for all 3 conditions, model calibration was worse at the second site.
Conclusions
Leveraging machine learning, we developed EPI models to accurately identify most patients with glaucoma, DR, and AMD in real-world datasets. The EPI models significantly outperform ICD-only models in identifying patients confirmed to have these conditions. These findings underscore the potential of using comprehensive EHR data combined with advanced machine learning techniques to improve the accuracy of patient phenotype identification, leading to better patient management and clinical outcomes.
Financial Disclosure(s)
Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.