Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis

IF 3.2 Q1 OPHTHALMOLOGY
Jalil Jalili PhD , Anuwat Jiravarnsirikul MD , Christopher Bowd PhD , Benton Chuter MD , Akram Belghith PhD , Michael H. Goldbaum MD , Sally L. Baxter MD , Robert N. Weinreb MD , Linda M. Zangwill PhD , Mark Christopher PhD
{"title":"Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis","authors":"Jalil Jalili PhD ,&nbsp;Anuwat Jiravarnsirikul MD ,&nbsp;Christopher Bowd PhD ,&nbsp;Benton Chuter MD ,&nbsp;Akram Belghith PhD ,&nbsp;Michael H. Goldbaum MD ,&nbsp;Sally L. Baxter MD ,&nbsp;Robert N. Weinreb MD ,&nbsp;Linda M. Zangwill PhD ,&nbsp;Mark Christopher PhD","doi":"10.1016/j.xops.2024.100667","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>The aim is to assess GPT-4V's (OpenAI) diagnostic accuracy and its capability to identify glaucoma-related features compared to expert evaluations.</div></div><div><h3>Design</h3><div>Evaluation of multimodal large language models for reviewing fundus images in glaucoma.</div></div><div><h3>Subjects</h3><div>A total of 300 fundus images from 3 public datasets (ACRIMA, ORIGA, and RIM-One v3) that included 139 glaucomatous and 161 nonglaucomatous cases were analyzed.</div></div><div><h3>Methods</h3><div>Preprocessing ensured each image was centered on the optic disc. GPT-4's vision-preview model (GPT-4V) assessed each image for various glaucoma-related criteria: image quality, image gradability, cup-to-disc ratio, peripapillary atrophy, disc hemorrhages, rim thinning (by quadrant and clock hour), glaucoma status, and estimated probability of glaucoma. Each image was analyzed twice by GPT-4V to evaluate consistency in its predictions. Two expert graders independently evaluated the same images using identical criteria. Comparisons between GPT-4V's assessments, expert evaluations, and dataset labels were made to determine accuracy, sensitivity, specificity, and Cohen kappa.</div></div><div><h3>Main Outcome Measures</h3><div>The main parameters measured were the accuracy, sensitivity, specificity, and Cohen kappa of GPT-4V in detecting glaucoma compared with expert evaluations.</div></div><div><h3>Results</h3><div>GPT-4V successfully provided glaucoma assessments for all 300 fundus images across the datasets, although approximately 35% required multiple prompt submissions. GPT-4V's overall accuracy in glaucoma detection was slightly lower (0.68, 0.70, and 0.81, respectively) than that of expert graders (0.78, 0.80, and 0.88, for expert grader 1 and 0.72, 0.78, and 0.87, for expert grader 2, respectively), across the ACRIMA, ORIGA, and RIM-ONE datasets. In Glaucoma detection, GPT-4V showed variable agreement by dataset and expert graders, with Cohen kappa values ranging from 0.08 to 0.72. In terms of feature detection, GPT-4V demonstrated high consistency (repeatability) in image gradability, with an agreement accuracy of ≥89% and substantial agreement in rim thinning and cup-to-disc ratio assessments, although kappas were generally lower than expert-to-expert agreement.</div></div><div><h3>Conclusions</h3><div>GPT-4V shows promise as a tool in glaucoma screening and detection through fundus image analysis, demonstrating generally high agreement with expert evaluations of key diagnostic features, although agreement did vary substantially across datasets.</div></div><div><h3>Financial Disclosure(s)</h3><div>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</div></div>","PeriodicalId":74363,"journal":{"name":"Ophthalmology science","volume":"5 2","pages":"Article 100667"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11773068/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666914524002033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

The aim is to assess GPT-4V's (OpenAI) diagnostic accuracy and its capability to identify glaucoma-related features compared to expert evaluations.

Design

Evaluation of multimodal large language models for reviewing fundus images in glaucoma.

Subjects

A total of 300 fundus images from 3 public datasets (ACRIMA, ORIGA, and RIM-One v3) that included 139 glaucomatous and 161 nonglaucomatous cases were analyzed.

Methods

Preprocessing ensured each image was centered on the optic disc. GPT-4's vision-preview model (GPT-4V) assessed each image for various glaucoma-related criteria: image quality, image gradability, cup-to-disc ratio, peripapillary atrophy, disc hemorrhages, rim thinning (by quadrant and clock hour), glaucoma status, and estimated probability of glaucoma. Each image was analyzed twice by GPT-4V to evaluate consistency in its predictions. Two expert graders independently evaluated the same images using identical criteria. Comparisons between GPT-4V's assessments, expert evaluations, and dataset labels were made to determine accuracy, sensitivity, specificity, and Cohen kappa.

Main Outcome Measures

The main parameters measured were the accuracy, sensitivity, specificity, and Cohen kappa of GPT-4V in detecting glaucoma compared with expert evaluations.

Results

GPT-4V successfully provided glaucoma assessments for all 300 fundus images across the datasets, although approximately 35% required multiple prompt submissions. GPT-4V's overall accuracy in glaucoma detection was slightly lower (0.68, 0.70, and 0.81, respectively) than that of expert graders (0.78, 0.80, and 0.88, for expert grader 1 and 0.72, 0.78, and 0.87, for expert grader 2, respectively), across the ACRIMA, ORIGA, and RIM-ONE datasets. In Glaucoma detection, GPT-4V showed variable agreement by dataset and expert graders, with Cohen kappa values ranging from 0.08 to 0.72. In terms of feature detection, GPT-4V demonstrated high consistency (repeatability) in image gradability, with an agreement accuracy of ≥89% and substantial agreement in rim thinning and cup-to-disc ratio assessments, although kappas were generally lower than expert-to-expert agreement.

Conclusions

GPT-4V shows promise as a tool in glaucoma screening and detection through fundus image analysis, demonstrating generally high agreement with expert evaluations of key diagnostic features, although agreement did vary substantially across datasets.

Financial Disclosure(s)

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
基于GPT-4V眼底图像分析的青光眼检测与特征识别。
目的:与专家评估相比,目的是评估GPT-4V (OpenAI)的诊断准确性及其识别青光眼相关特征的能力。设计:评价青光眼眼底图像的多模态大语言模型。研究对象:对来自3个公共数据集(ACRIMA、ORIGA和RIM-One v3)的300张眼底图像进行分析,其中包括139例青光眼和161例非青光眼。方法:对图像进行预处理,保证图像以视盘为中心。GPT-4的视觉预览模型(GPT-4V)评估每张图像的各种青光眼相关标准:图像质量、图像可分级性、杯盘比、乳头周围萎缩、椎间盘出血、边缘变薄(按象限和时钟小时计算)、青光眼状态和青光眼的估计概率。GPT-4V对每张图像进行了两次分析,以评估其预测的一致性。两位专家评分员使用相同的标准独立评估相同的图像。将GPT-4V的评估、专家评估和数据集标签进行比较,以确定准确性、敏感性、特异性和科恩kappa。主要观察指标:测量的主要参数为GPT-4V检测青光眼的准确性、敏感性、特异性和Cohen kappa与专家评价的比较。结果:GPT-4V成功地为数据集中的所有300张眼底图像提供了青光眼评估,尽管大约35%的图像需要多次及时提交。在ACRIMA、ORIGA和ring - one数据集上,GPT-4V在青光眼检测方面的总体准确性略低于专家评分者(专家评分者1为0.78、0.80和0.88,专家评分者2为0.72、0.78和0.87)。在青光眼检测中,GPT-4V在数据集和专家评分者之间表现出不同的一致性,Cohen kappa值在0.08到0.72之间。在特征检测方面,GPT-4V在图像分级方面表现出高度的一致性(可重复性),一致性精度≥89%,在边缘细化和杯盘比评估方面也有很大的一致性,尽管kappas通常低于专家对专家的一致性。结论:通过眼底图像分析,GPT-4V有望成为青光眼筛查和检测的工具,与专家对关键诊断特征的评估普遍高度一致,尽管不同数据集的一致性存在很大差异。财务披露:专有或商业披露可在本文末尾的脚注和披露中找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ophthalmology science
Ophthalmology science Ophthalmology
CiteScore
3.40
自引率
0.00%
发文量
0
审稿时长
89 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信