Fundus photograph interpretation of common retinal disorders by artificial intelligence chatbots

Andrew Mihalache , Ryan S. Huang , Marko M. Popovic , Peng Yan , Rajeev H. Muni , Suber S. Huang , David T. Wong
{"title":"Fundus photograph interpretation of common retinal disorders by artificial intelligence chatbots","authors":"Andrew Mihalache ,&nbsp;Ryan S. Huang ,&nbsp;Marko M. Popovic ,&nbsp;Peng Yan ,&nbsp;Rajeev H. Muni ,&nbsp;Suber S. Huang ,&nbsp;David T. Wong","doi":"10.1016/j.ajoint.2025.100154","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>While previous studies have examined the ability of artificial intelligence (AI) chatbots to interpret optical coherence tomography scans, their performance in interpreting fundus photographs of retinal disorders without text-based context remains unexplored. This study aims to evaluate the ability of three widely used AI chatbots to accurately diagnose common retinal disorders from fundus photographs in the absence of text-based context.</div></div><div><h3>Design</h3><div>Cross-sectional study.</div></div><div><h3>Methods</h3><div>We prompted ChatGPT-4, Gemini, and Copilot, with a set of 50 fundus photographs from the American Society of Retina Specialists Retina Image Bank® in March 2024, comprising age-related macular degeneration, diabetic retinopathy, epiretinal membrane, retinal vein occlusion, and retinal detachment. Chatbots were re-prompted four times using the same images throughout June 2024. The primary endpoint was the proportion of each chatbot’s correct diagnoses. No text-based guidance was provided.</div></div><div><h3>Results</h3><div>In March 2024, Gemini provided a correct diagnosis for 17 (34 %, 95 % CI: 21–49 %) fundus images, ChatGPT-4 for 16 (32 %, 95 % CI: 20–47 %), and Copilot for 9 (18 %, 95 % CI: 9–31 %) (<em>p</em> &gt; 0.05). In June 2024, Gemini provided a correct diagnosis for 122 (61 %, 95 % CI: 53–67 %) images, ChatGPT-4 for 101 (51 %, 95 % CI: 43–58 %), and Copilot for 57 (29 %, 95 % CI: 22–35 %).</div></div><div><h3>Conclusion</h3><div>No AI chatbot use in this study was sufficiently accurate for the diagnosis of common retinal disorders from fundus photographs. AI chatbots should not currently be utilized in any clinical setting involving fundus images, given concerns for accuracy and bioethical considerations.</div></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"2 3","pages":"Article 100154"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253525000577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

While previous studies have examined the ability of artificial intelligence (AI) chatbots to interpret optical coherence tomography scans, their performance in interpreting fundus photographs of retinal disorders without text-based context remains unexplored. This study aims to evaluate the ability of three widely used AI chatbots to accurately diagnose common retinal disorders from fundus photographs in the absence of text-based context.

Design

Cross-sectional study.

Methods

We prompted ChatGPT-4, Gemini, and Copilot, with a set of 50 fundus photographs from the American Society of Retina Specialists Retina Image Bank® in March 2024, comprising age-related macular degeneration, diabetic retinopathy, epiretinal membrane, retinal vein occlusion, and retinal detachment. Chatbots were re-prompted four times using the same images throughout June 2024. The primary endpoint was the proportion of each chatbot’s correct diagnoses. No text-based guidance was provided.

Results

In March 2024, Gemini provided a correct diagnosis for 17 (34 %, 95 % CI: 21–49 %) fundus images, ChatGPT-4 for 16 (32 %, 95 % CI: 20–47 %), and Copilot for 9 (18 %, 95 % CI: 9–31 %) (p > 0.05). In June 2024, Gemini provided a correct diagnosis for 122 (61 %, 95 % CI: 53–67 %) images, ChatGPT-4 for 101 (51 %, 95 % CI: 43–58 %), and Copilot for 57 (29 %, 95 % CI: 22–35 %).

Conclusion

No AI chatbot use in this study was sufficiently accurate for the diagnosis of common retinal disorders from fundus photographs. AI chatbots should not currently be utilized in any clinical setting involving fundus images, given concerns for accuracy and bioethical considerations.
人工智能聊天机器人对常见视网膜疾病眼底照片的解译
虽然之前的研究已经研究了人工智能(AI)聊天机器人解释光学相干断层扫描的能力,但它们在没有基于文本的背景下解释视网膜疾病眼底照片的表现仍未得到探索。本研究旨在评估三种广泛使用的人工智能聊天机器人在没有基于文本的背景下从眼底照片中准确诊断常见视网膜疾病的能力。DesignCross-sectional研究。方法:使用美国视网膜专家协会视网膜图像库®于2024年3月提供的一组50张眼底照片,包括年龄相关性黄斑变性、糖尿病视网膜病变、视网膜前膜、视网膜静脉闭塞和视网膜脱离,提示ChatGPT-4、Gemini和Copilot。在2024年6月,聊天机器人使用相同的图像重新提示了四次。主要终点是每个聊天机器人正确诊断的比例。没有提供基于文本的指导。结果2024年3月,Gemini对17张眼底图像(34%,95% CI: 21 - 49%)进行了正确诊断,ChatGPT-4对16张(32%,95% CI: 20 - 47%)进行了正确诊断,Copilot对9张(18%,95% CI: 9 - 31%)进行了正确诊断(p >;0.05)。2024年6月,Gemini为122张(61%,95% CI: 53 - 67%)图像提供了正确诊断,ChatGPT-4为101张(51%,95% CI: 43 - 58%), Copilot为57张(29%,95% CI: 22 - 35%)。结论本研究中使用的人工智能聊天机器人对眼底照片中常见的视网膜疾病的诊断不够准确。考虑到准确性和生物伦理方面的考虑,人工智能聊天机器人目前不应用于任何涉及眼底图像的临床环境。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信