Longitudinal Comparison of Geographic Atrophy Enlargement Using Manual, Semiautomated, and Deep Learning Approaches

IF 3.2 Q1 OPHTHALMOLOGY

Ophthalmology science Pub Date : 2025-04-07 DOI:10.1016/j.xops.2025.100787

Jacob Bogost , Rachel E. Linderman PhD , Robert Slater PhD , Thomas F. Saunders OD , Caleb Pacheco , Jeong Pak PhD , Rick Voland PhD , Barbara Blodi MD , Amitha Domalpally MD, PhD

{"title":"Longitudinal Comparison of Geographic Atrophy Enlargement Using Manual, Semiautomated, and Deep Learning Approaches","authors":"Jacob Bogost , Rachel E. Linderman PhD , Robert Slater PhD , Thomas F. Saunders OD , Caleb Pacheco , Jeong Pak PhD , Rick Voland PhD , Barbara Blodi MD , Amitha Domalpally MD, PhD","doi":"10.1016/j.xops.2025.100787","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To compare a fully automated artificial intelligence (AI) model, a semiautomated method, and manual planimetry in the longitudinal assessment of geographic atrophy (GA) using fundus autofluorescence images.</div></div><div><h3>Design</h3><div>A retrospective analysis of 3 GA assessment methods: AI, Heidelberg Eye Explorer semiautomated software (RegionFinder), and manual planimetry.</div></div><div><h3>Subjects and Controls</h3><div>One hundred eight patients (185 eyes) with GA from a phase IIb clinical trial by GlaxoSmithKline, which evaluated an experimental drug that did not reduce GA enlargement compared with the placebo.</div></div><div><h3>Methods</h3><div>Fundus autofluorescence images of 185 eyes were annotated using manual planimetry, semiautomated RegionFinder, and a fully automated AI model trained and validated on manual planimetry annotations at screening, year 1, and year 2. Artificial intelligence masks were compared with human-guided methods, and regression errors were assessed by stacking masks from consecutive visits. Agreement between methods was assessed using Bland−Altman plots, Dice similarity coefficient (DSC), and comparisons of GA growth rates. Artificial intelligence performance was evaluated based on its need for human edits and frequency of regression errors.</div></div><div><h3>Main Outcome Measures</h3><div>Agreement between methods was evaluated using Bland−Altman plots, DSC, and intraclass correlation coefficients (ICCs). The mean GA growth rate (mm<sup>2</sup>/year) and square root transformation of GA size were compared across methods. Artificial intelligence performance was assessed by the percentage of acceptable masks and the frequency of longitudinal regression errors.</div></div><div><h3>Results</h3><div>At screening, the mean GA area was 7.22 mm<sup>2</sup> with RegionFinder, 8.37 mm<sup>2</sup> with AI, and 8.66 mm<sup>2</sup> with manual planimetry. RegionFinder measured smaller GA areas than both AI and manual, with a mean difference of −1.45 mm<sup>2</sup> (95% confidence interval [CI]: −1.56, −1.35) versus AI (ICC = 0.945) and −1.87 mm<sup>2</sup> (95% CI: −1.99, −1.75) versus manual (ICC = 0.920). Growth rates were comparable between RegionFinder (1.54 mm<sup>2</sup>/year), AI (1.68 mm<sup>2</sup>/year), and manual (1.80 mm<sup>2</sup>/year) (<em>P</em> = 0.25). Artificial intelligence masks were deemed acceptable in 84.8% of visits, and 81.4% of cases showed no regression over time.</div></div><div><h3>Conclusions</h3><div>Artificial intelligence accurately measures GA in approximately 85% of cases, requiring human intervention in only 15%, indicating potential to streamline GA measurement in clinical trials while maintaining human oversight.</div></div><div><h3>Financial Disclosure(s)</h3><div>The author(s) have no proprietary or commercial interest in any materials discussed in this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 5","pages":"Article 100787"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914525000855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

To compare a fully automated artificial intelligence (AI) model, a semiautomated method, and manual planimetry in the longitudinal assessment of geographic atrophy (GA) using fundus autofluorescence images.

Design

A retrospective analysis of 3 GA assessment methods: AI, Heidelberg Eye Explorer semiautomated software (RegionFinder), and manual planimetry.

Subjects and Controls

One hundred eight patients (185 eyes) with GA from a phase IIb clinical trial by GlaxoSmithKline, which evaluated an experimental drug that did not reduce GA enlargement compared with the placebo.

Methods

Fundus autofluorescence images of 185 eyes were annotated using manual planimetry, semiautomated RegionFinder, and a fully automated AI model trained and validated on manual planimetry annotations at screening, year 1, and year 2. Artificial intelligence masks were compared with human-guided methods, and regression errors were assessed by stacking masks from consecutive visits. Agreement between methods was assessed using Bland−Altman plots, Dice similarity coefficient (DSC), and comparisons of GA growth rates. Artificial intelligence performance was evaluated based on its need for human edits and frequency of regression errors.

Main Outcome Measures

Agreement between methods was evaluated using Bland−Altman plots, DSC, and intraclass correlation coefficients (ICCs). The mean GA growth rate (mm²/year) and square root transformation of GA size were compared across methods. Artificial intelligence performance was assessed by the percentage of acceptable masks and the frequency of longitudinal regression errors.

Results

At screening, the mean GA area was 7.22 mm² with RegionFinder, 8.37 mm² with AI, and 8.66 mm² with manual planimetry. RegionFinder measured smaller GA areas than both AI and manual, with a mean difference of −1.45 mm² (95% confidence interval [CI]: −1.56, −1.35) versus AI (ICC = 0.945) and −1.87 mm² (95% CI: −1.99, −1.75) versus manual (ICC = 0.920). Growth rates were comparable between RegionFinder (1.54 mm²/year), AI (1.68 mm²/year), and manual (1.80 mm²/year) (P = 0.25). Artificial intelligence masks were deemed acceptable in 84.8% of visits, and 81.4% of cases showed no regression over time.

Conclusions

Artificial intelligence accurately measures GA in approximately 85% of cases, requiring human intervention in only 15%, indicating potential to streamline GA measurement in clinical trials while maintaining human oversight.

Financial Disclosure(s)

The author(s) have no proprietary or commercial interest in any materials discussed in this article.

查看原文本刊更多论文

使用手动、半自动和深度学习方法的地理萎缩扩大的纵向比较

目的比较全自动人工智能（AI）模型、半自动化方法和手工平面测量法在眼底自体荧光图像纵向评价地理萎缩（GA）中的应用。设计回顾性分析了3种GA评估方法：人工智能、海德堡眼科探索者半自动软件（RegionFinder）和手动平面测量。受试者和对照组：葛兰素史克公司进行的一项ⅱb期临床试验中，108名GA患者（185只眼睛）评估了一种实验药物，与安慰剂相比，该药物没有减少GA增大。方法对185只眼睛的眼底自身荧光图像进行手工平面测量、半自动RegionFinder和全自动人工智能模型的注释，并在筛查、1年和2年进行手工平面测量注释训练和验证。比较人工引导方法的人工智能掩模，通过叠加连续就诊的掩模来评估回归误差。使用Bland - Altman图、Dice相似系数（DSC）和GA生长速率比较来评估方法之间的一致性。人工智能的性能是根据人工编辑的需求和回归错误的频率来评估的。主要结局测量方法采用Bland - Altman图、DSC和类内相关系数（ICCs）评估方法间的一致性。比较了不同方法的平均GA生长率（mm2/年）和GA大小的平方根变换。通过可接受掩模的百分比和纵向回归误差的频率来评估人工智能的性能。结果在sat筛选中，RegionFinder的平均GA面积为7.22 mm2， AI的平均GA面积为8.37 mm2，手工平面测量的平均GA面积为8.66 mm2。RegionFinder测量的GA面积比人工智能和人工智能都要小，与人工智能相比（ICC = 0.945）的平均差异为- 1.45 mm2(95%置信区间[CI]: - 1.56, - 1.35)，与人工智能相比（ICC = 0.920）的平均差异为- 1.87 mm2 （95% CI: - 1.99, - 1.75）。RegionFinder （1.54 mm2/年）、AI （1.68 mm2/年）和手动（1.80 mm2/年）之间的增长率具有可比性（P = 0.25）。84.8%的患者认为人工智能口罩是可以接受的，81.4%的病例没有随着时间的推移而退化。结论：人工智能在大约85%的病例中准确测量GA，仅15%的病例需要人工干预，这表明在保持人工监督的同时，在临床试验中简化GA测量的潜力。财务披露作者在本文中讨论的任何材料中没有专有或商业利益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊