Andrew Mihalache , Ryan S. Huang , Marko M. Popovic , Peng Yan , Rajeev H. Muni , Suber S. Huang , David T. Wong
{"title":"Fundus photograph interpretation of common retinal disorders by artificial intelligence chatbots","authors":"Andrew Mihalache , Ryan S. Huang , Marko M. Popovic , Peng Yan , Rajeev H. Muni , Suber S. Huang , David T. Wong","doi":"10.1016/j.ajoint.2025.100154","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>While previous studies have examined the ability of artificial intelligence (AI) chatbots to interpret optical coherence tomography scans, their performance in interpreting fundus photographs of retinal disorders without text-based context remains unexplored. This study aims to evaluate the ability of three widely used AI chatbots to accurately diagnose common retinal disorders from fundus photographs in the absence of text-based context.</div></div><div><h3>Design</h3><div>Cross-sectional study.</div></div><div><h3>Methods</h3><div>We prompted ChatGPT-4, Gemini, and Copilot, with a set of 50 fundus photographs from the American Society of Retina Specialists Retina Image Bank® in March 2024, comprising age-related macular degeneration, diabetic retinopathy, epiretinal membrane, retinal vein occlusion, and retinal detachment. Chatbots were re-prompted four times using the same images throughout June 2024. The primary endpoint was the proportion of each chatbot’s correct diagnoses. No text-based guidance was provided.</div></div><div><h3>Results</h3><div>In March 2024, Gemini provided a correct diagnosis for 17 (34 %, 95 % CI: 21–49 %) fundus images, ChatGPT-4 for 16 (32 %, 95 % CI: 20–47 %), and Copilot for 9 (18 %, 95 % CI: 9–31 %) (<em>p</em> > 0.05). In June 2024, Gemini provided a correct diagnosis for 122 (61 %, 95 % CI: 53–67 %) images, ChatGPT-4 for 101 (51 %, 95 % CI: 43–58 %), and Copilot for 57 (29 %, 95 % CI: 22–35 %).</div></div><div><h3>Conclusion</h3><div>No AI chatbot use in this study was sufficiently accurate for the diagnosis of common retinal disorders from fundus photographs. AI chatbots should not currently be utilized in any clinical setting involving fundus images, given concerns for accuracy and bioethical considerations.</div></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"2 3","pages":"Article 100154"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253525000577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
While previous studies have examined the ability of artificial intelligence (AI) chatbots to interpret optical coherence tomography scans, their performance in interpreting fundus photographs of retinal disorders without text-based context remains unexplored. This study aims to evaluate the ability of three widely used AI chatbots to accurately diagnose common retinal disorders from fundus photographs in the absence of text-based context.
Design
Cross-sectional study.
Methods
We prompted ChatGPT-4, Gemini, and Copilot, with a set of 50 fundus photographs from the American Society of Retina Specialists Retina Image Bank® in March 2024, comprising age-related macular degeneration, diabetic retinopathy, epiretinal membrane, retinal vein occlusion, and retinal detachment. Chatbots were re-prompted four times using the same images throughout June 2024. The primary endpoint was the proportion of each chatbot’s correct diagnoses. No text-based guidance was provided.
Results
In March 2024, Gemini provided a correct diagnosis for 17 (34 %, 95 % CI: 21–49 %) fundus images, ChatGPT-4 for 16 (32 %, 95 % CI: 20–47 %), and Copilot for 9 (18 %, 95 % CI: 9–31 %) (p > 0.05). In June 2024, Gemini provided a correct diagnosis for 122 (61 %, 95 % CI: 53–67 %) images, ChatGPT-4 for 101 (51 %, 95 % CI: 43–58 %), and Copilot for 57 (29 %, 95 % CI: 22–35 %).
Conclusion
No AI chatbot use in this study was sufficiently accurate for the diagnosis of common retinal disorders from fundus photographs. AI chatbots should not currently be utilized in any clinical setting involving fundus images, given concerns for accuracy and bioethical considerations.