人工智能与人类认知:ChatGPT和参加欧洲眼科文凭考试的考生的比较分析。

Q2 Medicine
Anna P Maino, Jakub Klikowski, Brendan Strong, Wahid Ghaffari, Michał Woźniak, Tristan Bourcier, Andrzej Grzybowski
{"title":"人工智能与人类认知:ChatGPT和参加欧洲眼科文凭考试的考生的比较分析。","authors":"Anna P Maino, Jakub Klikowski, Brendan Strong, Wahid Ghaffari, Michał Woźniak, Tristan Bourcier, Andrzej Grzybowski","doi":"10.3390/vision9020031","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/objectives: </strong>This paper aims to assess ChatGPT's performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results.</p><p><strong>Methods: </strong>This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020-2023 EBOD examinations. This study analyzed ChatGPT's responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions.</p><p><strong>Results: </strong>ChatGPT, for MCQs, scored on average 64.39%. ChatGPT's strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test <i>p</i> < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT's SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (<i>p</i> = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured.</p><p><strong>Conclusions: </strong>ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT's ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback.</p>","PeriodicalId":36586,"journal":{"name":"Vision (Switzerland)","volume":"9 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12015923/pdf/","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination.\",\"authors\":\"Anna P Maino, Jakub Klikowski, Brendan Strong, Wahid Ghaffari, Michał Woźniak, Tristan Bourcier, Andrzej Grzybowski\",\"doi\":\"10.3390/vision9020031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background/objectives: </strong>This paper aims to assess ChatGPT's performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results.</p><p><strong>Methods: </strong>This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020-2023 EBOD examinations. This study analyzed ChatGPT's responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions.</p><p><strong>Results: </strong>ChatGPT, for MCQs, scored on average 64.39%. ChatGPT's strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test <i>p</i> < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT's SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (<i>p</i> = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured.</p><p><strong>Conclusions: </strong>ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT's ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback.</p>\",\"PeriodicalId\":36586,\"journal\":{\"name\":\"Vision (Switzerland)\",\"volume\":\"9 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12015923/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Vision (Switzerland)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/vision9020031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vision (Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/vision9020031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

背景/目的:本文旨在评估ChatGPT在回答欧洲眼科文凭(EBOD)考试试卷中的表现,并将这些结果与通过基准和候选人结果进行比较。方法:本横断面研究使用2012年、2013年、2020-2023年EBOD考试的试卷样本。这项研究分析了ChatGPT对440个选择题(mcq)的回答,每个选择题包含5个真假陈述(总共2200个陈述)和48个单一最佳答案(SBA)问题。结果:mcq的ChatGPT平均得分为64.39%。ChatGPT在mcq中表现最好的指标是精度(68.76%)。ChatGPT对病理mcq的回答效果最好(Grubbs检验p < 0.05)。光学和折射在所有指标的MCQ表现中得分最低。ChatGPT-3.5 Turbo在简单问题上的表现不如人类候选人和chatgpt - 40(75%对100%的准确率),但在挑战性问题上的表现优于人类和chatgpt - 40(50%对28%的准确率)。ChatGPT的SBA性能平均为28.43%,在精度方面得分最高,表现最强(29.36%)。病理SBA问题始终是大多数指标中得分最低的主题。ChatGPT更频繁地选择选项1的趋势不显著(p = 0.19)。在回答SBAs时,人类候选人在所有度量领域的得分都高于ChatGPT。结论:ChatGPT在真假问题上表现更强,在大多数情况下得分及格。SBA问题的表现较差,说明ChatGPT在信息检索方面的能力强于知识整合方面的能力。ChatGPT可以成为眼科教育的一个有价值的工具,允许考试委员会测试他们的试卷,以确保他们的水平正确,标记开放式问题,并提供详细的反馈。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Artificial Intelligence vs. Human Cognition: A Comparative Analysis of ChatGPT and Candidates Sitting the European Board of Ophthalmology Diploma Examination.

Background/objectives: This paper aims to assess ChatGPT's performance in answering European Board of Ophthalmology Diploma (EBOD) examination papers and to compare these results to pass benchmarks and candidate results.

Methods: This cross-sectional study used a sample of past exam papers from 2012, 2013, 2020-2023 EBOD examinations. This study analyzed ChatGPT's responses to 440 multiple choice questions (MCQs), each containing five true/false statements (2200 statements in total) and 48 single best answer (SBA) questions.

Results: ChatGPT, for MCQs, scored on average 64.39%. ChatGPT's strongest metric performance for MCQs was precision (68.76%). ChatGPT performed best at answering pathology MCQs (Grubbs test p < 0.05). Optics and refraction had the lowest-scoring MCQ performance across all metrics. ChatGPT-3.5 Turbo performed worse than human candidates and ChatGPT-4o on easy questions (75% vs. 100% accuracy) but outperformed humans and ChatGPT-4o on challenging questions (50% vs. 28% accuracy). ChatGPT's SBA performance averaged 28.43%, with the highest score and strongest performance in precision (29.36%). Pathology SBA questions were consistently the lowest-scoring topic across most metrics. ChatGPT demonstrated a nonsignificant tendency to select option 1 more frequently (p = 0.19). When answering SBAs, human candidates scored higher than ChatGPT in all metric areas measured.

Conclusions: ChatGPT performed stronger for true/false questions, scoring a pass mark in most instances. Performance was poorer for SBA questions, suggesting that ChatGPT's ability in information retrieval is better than that in knowledge integration. ChatGPT could become a valuable tool in ophthalmic education, allowing exam boards to test their exam papers to ensure they are pitched at the right level, marking open-ended questions and providing detailed feedback.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Vision (Switzerland)
Vision (Switzerland) Health Professions-Optometry
CiteScore
2.30
自引率
0.00%
发文量
62
审稿时长
11 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信