Jacob Bogost , Rachel E. Linderman PhD , Robert Slater PhD , Thomas F. Saunders OD , Caleb Pacheco , Jeong Pak PhD , Rick Voland PhD , Barbara Blodi MD , Amitha Domalpally MD, PhD
{"title":"Longitudinal Comparison of Geographic Atrophy Enlargement Using Manual, Semiautomated, and Deep Learning Approaches","authors":"Jacob Bogost , Rachel E. Linderman PhD , Robert Slater PhD , Thomas F. Saunders OD , Caleb Pacheco , Jeong Pak PhD , Rick Voland PhD , Barbara Blodi MD , Amitha Domalpally MD, PhD","doi":"10.1016/j.xops.2025.100787","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To compare a fully automated artificial intelligence (AI) model, a semiautomated method, and manual planimetry in the longitudinal assessment of geographic atrophy (GA) using fundus autofluorescence images.</div></div><div><h3>Design</h3><div>A retrospective analysis of 3 GA assessment methods: AI, Heidelberg Eye Explorer semiautomated software (RegionFinder), and manual planimetry.</div></div><div><h3>Subjects and Controls</h3><div>One hundred eight patients (185 eyes) with GA from a phase IIb clinical trial by GlaxoSmithKline, which evaluated an experimental drug that did not reduce GA enlargement compared with the placebo.</div></div><div><h3>Methods</h3><div>Fundus autofluorescence images of 185 eyes were annotated using manual planimetry, semiautomated RegionFinder, and a fully automated AI model trained and validated on manual planimetry annotations at screening, year 1, and year 2. Artificial intelligence masks were compared with human-guided methods, and regression errors were assessed by stacking masks from consecutive visits. Agreement between methods was assessed using Bland−Altman plots, Dice similarity coefficient (DSC), and comparisons of GA growth rates. Artificial intelligence performance was evaluated based on its need for human edits and frequency of regression errors.</div></div><div><h3>Main Outcome Measures</h3><div>Agreement between methods was evaluated using Bland−Altman plots, DSC, and intraclass correlation coefficients (ICCs). The mean GA growth rate (mm<sup>2</sup>/year) and square root transformation of GA size were compared across methods. Artificial intelligence performance was assessed by the percentage of acceptable masks and the frequency of longitudinal regression errors.</div></div><div><h3>Results</h3><div>At screening, the mean GA area was 7.22 mm<sup>2</sup> with RegionFinder, 8.37 mm<sup>2</sup> with AI, and 8.66 mm<sup>2</sup> with manual planimetry. RegionFinder measured smaller GA areas than both AI and manual, with a mean difference of −1.45 mm<sup>2</sup> (95% confidence interval [CI]: −1.56, −1.35) versus AI (ICC = 0.945) and −1.87 mm<sup>2</sup> (95% CI: −1.99, −1.75) versus manual (ICC = 0.920). Growth rates were comparable between RegionFinder (1.54 mm<sup>2</sup>/year), AI (1.68 mm<sup>2</sup>/year), and manual (1.80 mm<sup>2</sup>/year) (<em>P</em> = 0.25). Artificial intelligence masks were deemed acceptable in 84.8% of visits, and 81.4% of cases showed no regression over time.</div></div><div><h3>Conclusions</h3><div>Artificial intelligence accurately measures GA in approximately 85% of cases, requiring human intervention in only 15%, indicating potential to streamline GA measurement in clinical trials while maintaining human oversight.</div></div><div><h3>Financial Disclosure(s)</h3><div>The author(s) have no proprietary or commercial interest in any materials discussed in this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 5","pages":"Article 100787"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525000855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
To compare a fully automated artificial intelligence (AI) model, a semiautomated method, and manual planimetry in the longitudinal assessment of geographic atrophy (GA) using fundus autofluorescence images.
Design
A retrospective analysis of 3 GA assessment methods: AI, Heidelberg Eye Explorer semiautomated software (RegionFinder), and manual planimetry.
Subjects and Controls
One hundred eight patients (185 eyes) with GA from a phase IIb clinical trial by GlaxoSmithKline, which evaluated an experimental drug that did not reduce GA enlargement compared with the placebo.
Methods
Fundus autofluorescence images of 185 eyes were annotated using manual planimetry, semiautomated RegionFinder, and a fully automated AI model trained and validated on manual planimetry annotations at screening, year 1, and year 2. Artificial intelligence masks were compared with human-guided methods, and regression errors were assessed by stacking masks from consecutive visits. Agreement between methods was assessed using Bland−Altman plots, Dice similarity coefficient (DSC), and comparisons of GA growth rates. Artificial intelligence performance was evaluated based on its need for human edits and frequency of regression errors.
Main Outcome Measures
Agreement between methods was evaluated using Bland−Altman plots, DSC, and intraclass correlation coefficients (ICCs). The mean GA growth rate (mm2/year) and square root transformation of GA size were compared across methods. Artificial intelligence performance was assessed by the percentage of acceptable masks and the frequency of longitudinal regression errors.
Results
At screening, the mean GA area was 7.22 mm2 with RegionFinder, 8.37 mm2 with AI, and 8.66 mm2 with manual planimetry. RegionFinder measured smaller GA areas than both AI and manual, with a mean difference of −1.45 mm2 (95% confidence interval [CI]: −1.56, −1.35) versus AI (ICC = 0.945) and −1.87 mm2 (95% CI: −1.99, −1.75) versus manual (ICC = 0.920). Growth rates were comparable between RegionFinder (1.54 mm2/year), AI (1.68 mm2/year), and manual (1.80 mm2/year) (P = 0.25). Artificial intelligence masks were deemed acceptable in 84.8% of visits, and 81.4% of cases showed no regression over time.
Conclusions
Artificial intelligence accurately measures GA in approximately 85% of cases, requiring human intervention in only 15%, indicating potential to streamline GA measurement in clinical trials while maintaining human oversight.
Financial Disclosure(s)
The author(s) have no proprietary or commercial interest in any materials discussed in this article.