Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination

AJO International Pub Date : 2024-08-14 DOI:10.1016/j.ajoint.2024.100063

Matteo Mario Carlà , Federico Giannuzzi , Francesco Boselli , Stanislao Rizzo

{"title":"Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination","authors":"Matteo Mario Carlà , Federico Giannuzzi , Francesco Boselli , Stanislao Rizzo","doi":"10.1016/j.ajoint.2024.100063","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.</p></div><div><h3>Design</h3><p>Observational study.</p></div><div><h3>Methods</h3><p>The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.</p></div><div><h3>Results</h3><p>Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, <em>p</em> = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, <em>p</em> = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (<em>p</em> = 0.21).</p></div><div><h3>Conclusions</h3><p>Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.</p></div><div><h3>Precis</h3><p>Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.</p></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"1 3","pages":"Article 100063"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950253524000637/pdfft?md5=4e9c209c1a98a7ea76ea9d21c9040f92&pid=1-s2.0-S2950253524000637-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253524000637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.

Design

Observational study.

Methods

The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.

Results

Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, p = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, p = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (p = 0.21).

Conclusions

Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.

Precis

Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.

查看原文本刊更多论文

测试谷歌 DeepMind 的能力：双子座与 ChatGPT 4 面对欧洲眼科考试

本研究旨在比较 Google Gemini 和 ChatGPT-4 在面对欧洲眼科医师委员会（EBO）多项选择考试的三重模拟考试时的表现。答案可以是 "真"、"假 "或 "不知道"：答对得 1 分，答错得 0.5 分，"不知道 "得 0 分。至少要答对 60% 才能通过考试。向聊天机器人解释规则后，输入了包含 5 个语句的整个问题。收集正确答案率和最终得分。结果谷歌双子座和 ChatGPT-4 在 EBO 考试模拟中均取得了成功，平均正确率分别为 85.3 ± 3.1 % 和 83.3 ± 2.4 %。与 ChatGPT 相比，Gemini 的错误率更低（6.7 ± 1.5 % vs. 13.0 ± 2.6 %，p = 0.03），但回答 "不知道 "的频率更高（8.0 ± 2.7 % vs. 3.7 ± 1.5 %，p = 0.05）。在 3 次模拟中，两个聊天机器人在每个考试子专业中的正确率都至少达到 70%。结论Google Gemini 和 ChatGPT-4 都能在复杂的眼科考试中成功应对广泛的题目，与以前的版本相比准确率更高，突出了它们在教育和信息环境中不断发展的重要性。准确性谷歌双子座和 ChatGPT-4 都能在欧洲眼科医师委员会的 3 次连续模拟考试中取得成功，平均正确率分别为 85% 和 83%。与 ChatGPT 相比，Google Gemini 的错误率明显较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AJO International

自引率

0.00%

发文量