Can off-the-shelf visual large language models detect and diagnose ocular diseases from retinal photographs?

IF 2 Q2 OPHTHALMOLOGY
Sahana Srinivasan, Hongwei Ji, David Ziyou Chen, Wendy Wong, Zhi Da Soh, Jocelyn Hui Lin Goh, Krithi Pushpanathan, Xiaofei Wang, Weizhi Ma, Tien Yin Wong, Ya Xing Wang, Ching-Yu Cheng, Yih Chung Tham
{"title":"Can off-the-shelf visual large language models detect and diagnose ocular diseases from retinal photographs?","authors":"Sahana Srinivasan, Hongwei Ji, David Ziyou Chen, Wendy Wong, Zhi Da Soh, Jocelyn Hui Lin Goh, Krithi Pushpanathan, Xiaofei Wang, Weizhi Ma, Tien Yin Wong, Ya Xing Wang, Ching-Yu Cheng, Yih Chung Tham","doi":"10.1136/bmjophth-2024-002076","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The advent of generative artificial intelligence has led to the emergence of multiple vision large language models (VLLMs). This study aimed to evaluate the capabilities of commonly available VLLMs, such as OpenAI's GPT-4V and Google's Gemini, in detecting and diagnosing ocular diseases from retinal images.</p><p><strong>Methods and analysis: </strong>From the Singapore Epidemiology of Eye Diseases (SEED) study, we selected 44 representative retinal photographs, including 10 healthy and 34 representing six eye diseases (age-related macular degeneration, diabetic retinopathy, glaucoma, visually significant cataract, myopic macular degeneration and retinal vein occlusion). OpenAI's GPT-4V (both default and data analyst modes) and Google Gemini were prompted with each image to determine if the retina was normal or abnormal and to provide diagnostic descriptions if deemed abnormal. The outputs from the VLLMs were evaluated for accuracy by three attending-level ophthalmologists using a three-point scale (poor, borderline, good).</p><p><strong>Results: </strong>GPT-4V default mode demonstrated the highest detection rate, correctly identifying 33 out of 34 detected correctly (97.1%), outperforming its data analyst mode (61.8%) and Google Gemini (41.2%). Despite the relatively high detection rates, the quality of diagnostic descriptions was generally suboptimal-with only 21.2% of GPT-4V's (default) responses, 4.8% of GPT-4V's (data analyst) responses and 28.6% for Google Gemini's responses rated as good.</p><p><strong>Conclusions: </strong>Although GPT-4V default mode showed generally high sensitivity in abnormality detection, all evaluated VLLMs were inadequate in providing accurate diagnoses for ocular diseases. These findings emphasise the need for domain-customised VLLMs and suggest the continued need for human oversight in clinical ophthalmology.</p>","PeriodicalId":9286,"journal":{"name":"BMJ Open Ophthalmology","volume":"10 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Open Ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjophth-2024-002076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: The advent of generative artificial intelligence has led to the emergence of multiple vision large language models (VLLMs). This study aimed to evaluate the capabilities of commonly available VLLMs, such as OpenAI's GPT-4V and Google's Gemini, in detecting and diagnosing ocular diseases from retinal images.

Methods and analysis: From the Singapore Epidemiology of Eye Diseases (SEED) study, we selected 44 representative retinal photographs, including 10 healthy and 34 representing six eye diseases (age-related macular degeneration, diabetic retinopathy, glaucoma, visually significant cataract, myopic macular degeneration and retinal vein occlusion). OpenAI's GPT-4V (both default and data analyst modes) and Google Gemini were prompted with each image to determine if the retina was normal or abnormal and to provide diagnostic descriptions if deemed abnormal. The outputs from the VLLMs were evaluated for accuracy by three attending-level ophthalmologists using a three-point scale (poor, borderline, good).

Results: GPT-4V default mode demonstrated the highest detection rate, correctly identifying 33 out of 34 detected correctly (97.1%), outperforming its data analyst mode (61.8%) and Google Gemini (41.2%). Despite the relatively high detection rates, the quality of diagnostic descriptions was generally suboptimal-with only 21.2% of GPT-4V's (default) responses, 4.8% of GPT-4V's (data analyst) responses and 28.6% for Google Gemini's responses rated as good.

Conclusions: Although GPT-4V default mode showed generally high sensitivity in abnormality detection, all evaluated VLLMs were inadequate in providing accurate diagnoses for ocular diseases. These findings emphasise the need for domain-customised VLLMs and suggest the continued need for human oversight in clinical ophthalmology.

背景:生成式人工智能的出现导致了多种视觉大语言模型(VLLM)的出现。本研究旨在评估常见视觉大语言模型(如 OpenAI 的 GPT-4V 和谷歌的 Gemini)从视网膜图像中检测和诊断眼部疾病的能力:我们从新加坡眼科疾病流行病学(SEED)研究中选取了 44 张具有代表性的视网膜照片,其中包括 10 张健康照片和 34 张代表六种眼科疾病的照片(老年性黄斑变性、糖尿病视网膜病变、青光眼、视物明显白内障、近视性黄斑变性和视网膜静脉闭塞)。OpenAI 的 GPT-4V(默认模式和数据分析师模式)和 Google Gemini 会对每张图像进行提示,以确定视网膜是正常还是异常,并在认为异常时提供诊断描述。VLLM 的输出结果由三位眼科主治医生采用三点评分法(差、边缘、好)进行准确性评估:GPT-4V 默认模式的检测率最高,在 34 次检测中正确识别了 33 次(97.1%),超过了其数据分析师模式(61.8%)和谷歌双子星模式(41.2%)。尽管检测率相对较高,但诊断描述的质量普遍不理想--只有 21.2% 的 GPT-4V (默认)响应、4.8% 的 GPT-4V (数据分析师)响应和 28.6% 的 Google Gemini 响应被评为良好:结论:尽管 GPT-4V 默认模式在异常检测方面显示出普遍较高的灵敏度,但所有评估过的 VLLM 在提供眼科疾病的准确诊断方面都存在不足。这些发现强调了领域定制 VLLM 的必要性,并表明在临床眼科中仍然需要人为监督。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMJ Open Ophthalmology
BMJ Open Ophthalmology OPHTHALMOLOGY-
CiteScore
3.40
自引率
4.20%
发文量
104
审稿时长
20 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信