Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

Matteo Mario Carlà , Federico Giannuzzi , Francesco Boselli , Stanislao Rizzo
{"title":"Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination","authors":"Matteo Mario Carlà ,&nbsp;Federico Giannuzzi ,&nbsp;Francesco Boselli ,&nbsp;Stanislao Rizzo","doi":"10.1016/j.ajoint.2024.100063","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.</p></div><div><h3>Design</h3><p>Observational study.</p></div><div><h3>Methods</h3><p>The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.</p></div><div><h3>Results</h3><p>Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, <em>p</em> = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, <em>p</em> = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (<em>p</em> = 0.21).</p></div><div><h3>Conclusions</h3><p>Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.</p></div><div><h3>Precis</h3><p>Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.</p></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"1 3","pages":"Article 100063"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950253524000637/pdfft?md5=4e9c209c1a98a7ea76ea9d21c9040f92&pid=1-s2.0-S2950253524000637-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253524000637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.

Design

Observational study.

Methods

The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.

Results

Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, p = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, p = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (p = 0.21).

Conclusions

Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.

Precis

Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.

测试谷歌 DeepMind 的能力:双子座与 ChatGPT 4 面对欧洲眼科考试
本研究旨在比较 Google Gemini 和 ChatGPT-4 在面对欧洲眼科医师委员会(EBO)多项选择考试的三重模拟考试时的表现。答案可以是 "真"、"假 "或 "不知道":答对得 1 分,答错得 0.5 分,"不知道 "得 0 分。至少要答对 60% 才能通过考试。向聊天机器人解释规则后,输入了包含 5 个语句的整个问题。收集正确答案率和最终得分。结果谷歌双子座和 ChatGPT-4 在 EBO 考试模拟中均取得了成功,平均正确率分别为 85.3 ± 3.1 % 和 83.3 ± 2.4 %。与 ChatGPT 相比,Gemini 的错误率更低(6.7 ± 1.5 % vs. 13.0 ± 2.6 %,p = 0.03),但回答 "不知道 "的频率更高(8.0 ± 2.7 % vs. 3.7 ± 1.5 %,p = 0.05)。在 3 次模拟中,两个聊天机器人在每个考试子专业中的正确率都至少达到 70%。结论Google Gemini 和 ChatGPT-4 都能在复杂的眼科考试中成功应对广泛的题目,与以前的版本相比准确率更高,突出了它们在教育和信息环境中不断发展的重要性。准确性谷歌双子座和 ChatGPT-4 都能在欧洲眼科医师委员会的 3 次连续模拟考试中取得成功,平均正确率分别为 85% 和 83%。与 ChatGPT 相比,Google Gemini 的错误率明显较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信