Analysis of ChatGPT-4's performance on ophthalmology questions from the MIR exam

Archivos de la Sociedad Espanola de Oftalmologia Pub Date : 2025-06-01 DOI:10.1016/j.oftale.2025.05.002

C.E. Monera Lucas , C. Mora Caballero , J. Escolano Serrano , A. Machan , G. Castilla Martínez , D. Romero Valero , J. Campello Lluch

{"title":"Analysis of ChatGPT-4's performance on ophthalmology questions from the MIR exam","authors":"C.E. Monera Lucas , C. Mora Caballero , J. Escolano Serrano , A. Machan , G. Castilla Martínez , D. Romero Valero , J. Campello Lluch","doi":"10.1016/j.oftale.2025.05.002","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To evaluate the performance of ChatGPT in solving clinical scenarios in ophthalmology, specifically questions from the specialty exams for Resident Medical Interns (MIR).</div></div><div><h3>Design</h3><div>Cross-sectional design for evaluating a diagnostic tool.</div></div><div><h3>Method</h3><div>Ophthalmology questions from the MIR exams from the 2010–2023 sessions were collected. The performance of ChatGPT in successfully answering the questions was calculated. The results were also compared with those obtained by ophthalmology professionals. Additionally, sensitivity, specificity, and positive and negative probability coefficients were calculated.</div></div><div><h3>Results</h3><div>A total of 54 questions were collected, with those from the subspecialty \"Retina\" being the most frequent. ChatGPT's overall score was 90.2%, with a sensitivity of 92.59% and a specificity of 96.8%. The average concordance with the evaluators' answers was 86.41%. The agreement between the evaluators was 79.62%.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 is a useful tool for solving clinical scenarios and theoretical questions in ophthalmology. Proper use of the tool, supervised by professionals, can help optimize the care processes for ophthalmology patients.</div></div>","PeriodicalId":93886,"journal":{"name":"Archivos de la Sociedad Espanola de Oftalmologia","volume":"100 6","pages":"Pages 314-319"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archivos de la Sociedad Espanola de Oftalmologia","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2173579425000775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

To evaluate the performance of ChatGPT in solving clinical scenarios in ophthalmology, specifically questions from the specialty exams for Resident Medical Interns (MIR).

Design

Cross-sectional design for evaluating a diagnostic tool.

Method

Ophthalmology questions from the MIR exams from the 2010–2023 sessions were collected. The performance of ChatGPT in successfully answering the questions was calculated. The results were also compared with those obtained by ophthalmology professionals. Additionally, sensitivity, specificity, and positive and negative probability coefficients were calculated.

Results

A total of 54 questions were collected, with those from the subspecialty "Retina" being the most frequent. ChatGPT's overall score was 90.2%, with a sensitivity of 92.59% and a specificity of 96.8%. The average concordance with the evaluators' answers was 86.41%. The agreement between the evaluators was 79.62%.

Conclusions

ChatGPT-4 is a useful tool for solving clinical scenarios and theoretical questions in ophthalmology. Proper use of the tool, supervised by professionals, can help optimize the care processes for ophthalmology patients.

查看原文本刊更多论文

ChatGPT-4在MIR考试眼科试题中的表现分析。

目的：评价ChatGPT在解决眼科临床问题，特别是住院医师实习（MIR）专业考试中的问题方面的表现。设计：评估诊断工具的横断面设计。方法：收集2010-2023年MIR考试眼科问题。计算了ChatGPT成功回答问题的性能。并与眼科专业人员的结果进行比较。此外，还计算了敏感性、特异性和正、负概率系数。结果：共收集到54个问题，其中“视网膜”亚专科的问题最多。ChatGPT的总评分为90.2%，敏感性为92.59%，特异性为96.8%。与评价者回答的平均一致性为86.41%。评价者的一致性为79.62%。结论：ChatGPT-4是解决眼科临床问题和理论问题的有效工具。在专业人员的监督下正确使用该工具，可以帮助优化眼科患者的护理过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Archivos de la Sociedad Espanola de Oftalmologia

自引率

0.00%

发文量