Performance of Chatgpt in ophthalmology exam; human versus AI.

IF 1.4 4区医学 Q3 OPHTHALMOLOGY

International Ophthalmology Pub Date : 2024-11-06 DOI:10.1007/s10792-024-03353-w

Ali Safa Balci, Zeliha Yazar, Banu Turgut Ozturk, Cigdem Altan

{"title":"Performance of Chatgpt in ophthalmology exam; human versus AI.","authors":"Ali Safa Balci, Zeliha Yazar, Banu Turgut Ozturk, Cigdem Altan","doi":"10.1007/s10792-024-03353-w","DOIUrl":null,"url":null,"abstract":"Purpose: This cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.Methods: The 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.Results: Out of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 ± 12.40.Conclusion: ChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.","PeriodicalId":14473,"journal":{"name":"International Ophthalmology","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10792-024-03353-w","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This cross-sectional study focuses on evaluating the success rate of ChatGPT in answering questions from the 'Resident Training Development Exam' and comparing these results with the performance of the ophthalmology residents.

Methods: The 75 exam questions, across nine sections and three difficulty levels, were presented to ChatGPT. The responses and explanations were recorded. The readability and complexity of the explanations were analyzed and The Flesch Reading Ease (FRE) score (0-100) was recorded using the program named Readable. Residents were categorized into four groups based on their seniority. The overall and seniority-specific success rates of the residents were compared separately with ChatGPT.

Results: Out of 69 questions, ChatGPT answered 37 correctly (53.62%). The highest success was in Lens and Cataract (77.77%), and the lowest in Pediatric Ophthalmology and Strabismus (0.00%). Of 789 residents, overall accuracy was 50.37%. Seniority-specific accuracy rates were 43.49%, 51.30%, 54.91%, and 60.05% for 1st to 4th-year residents. ChatGPT ranked 292nd among residents. Difficulty-wise, 11 questions were easy, 44 moderate, and 14 difficult. ChatGPT's accuracy for each level was 63.63%, 54.54%, and 42.85%, respectively. The average FRE score of responses generated by ChatGPT was found to be 27.56 ± 12.40.

Conclusion: ChatGPT correctly answered 53.6% of questions in an exam for residents. ChatGPT has a lower success rate on average than a 3rd year resident. The readability of responses provided by ChatGPT is low, and they are difficult to understand. As difficulty increases, ChatGPT's success decreases. Predictably, these results will change with more information loaded into ChatGPT.

查看原文本刊更多论文

Chatgpt 在眼科检查中的表现；人类与人工智能。

目的：本横断面研究的重点是评估 ChatGPT 回答 "住院医师培训发展考试 "问题的成功率，并将这些结果与眼科住院医师的表现进行比较：方法：向 ChatGPT 演示了 75 道考试题，包括九个部分和三个难度级别。方法：在 ChatGPT 上展示了 75 道考题，包括 9 个部分和 3 个难度级别，并记录了答案和解释。使用名为 "Readable "的程序分析了解释的可读性和复杂性，并记录了弗莱什阅读容易度（FRE）得分（0-100）。根据居民的年资将其分为四组。通过 ChatGPT 分别比较了住院医师的总体成功率和特定年资的成功率：在 69 个问题中，ChatGPT 回答正确 37 个（53.62%）。成功率最高的是晶状体和白内障（77.77%），最低的是小儿眼科和斜视（0.00%）。在 789 名住院医师中，总体准确率为 50.37%。一年级至四年级住院医师的准确率分别为 43.49%、51.30%、54.91% 和 60.05%。ChatGPT 在住院医师中排名第 292 位。从难度上看，11 道题简单，44 道题中等，14 道题困难。ChatGPT 在每个级别的准确率分别为 63.63%、54.54% 和 42.85%。ChatGPT 生成的回答的平均 FRE 分数为 27.56 ± 12.40：ChatGPT 正确回答了住院医师考试中 53.6% 的问题。ChatGPT 的平均成功率低于三年级住院医师。ChatGPT 提供的回答可读性低，难以理解。随着难度的增加，ChatGPT 的成功率也在下降。可以预见的是，这些结果会随着 ChatGPT 装载更多信息而发生变化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Ophthalmology OPHTHALMOLOGY-

CiteScore

3.20

自引率

0.00%

发文量

451

期刊介绍： International Ophthalmology provides the clinician with articles on all the relevant subspecialties of ophthalmology, with a broad international scope. The emphasis is on presentation of the latest clinical research in the field. In addition, the journal includes regular sections devoted to new developments in technologies, products, and techniques.