Matteo Mario Carlà , Federico Giannuzzi , Francesco Boselli , Stanislao Rizzo
{"title":"Testing the power of Google DeepMind: Gemini versus ChatGPT 4 facing a European ophthalmology examination","authors":"Matteo Mario Carlà , Federico Giannuzzi , Francesco Boselli , Stanislao Rizzo","doi":"10.1016/j.ajoint.2024.100063","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.</p></div><div><h3>Design</h3><p>Observational study.</p></div><div><h3>Methods</h3><p>The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.</p></div><div><h3>Results</h3><p>Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, <em>p</em> = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, <em>p</em> = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (<em>p</em> = 0.21).</p></div><div><h3>Conclusions</h3><p>Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.</p></div><div><h3>Precis</h3><p>Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.</p></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"1 3","pages":"Article 100063"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2950253524000637/pdfft?md5=4e9c209c1a98a7ea76ea9d21c9040f92&pid=1-s2.0-S2950253524000637-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253524000637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
The aim of this study was to compare the performances of Google Gemini and ChatGPT-4, facing a triple simulation of the European Board of Ophthalmologists (EBO) multiple choices exam.
Design
Observational study.
Methods
The EBO multiple choice examination consists of 52 questions followed by 5 statements each, for a total of 260 answers. Statements may be answered with “True”, “False” or “Don't Know”: a correct answer is awarded 1 point; an incorrect is penalized 0.5 points; “don't know” scores 0 points. At least 60 % correct answers are needed to pass the exam. After explaining the rules to the chatbots, he entire question with the 5 statements was input. The rate of correct answers and the final score were collected. The exam simulation was repeated 3 times with randomly generated questions.
Results
Google Gemini and ChatGPT-4 succeed in EBO exam simulations in all 3 cases, with an average 85.3 ± 3.1 % and 83.3 ± 2.4 % of correct answers. Gemini had a lower error rate compared to ChatGPT (6.7 ± 1.5 % vs. 13.0 ± 2.6 %, p = 0.03), but answered “Don't know” more frequently (8.0 ± 2.7 % vs. 3.7 ± 1.5 %, p = 0.05). Both chatbots scored at least 70 % of correct answers in each exam subspecialty across the 3 simulations. Converting the percentages into points, Gemini scored 213.5 ± 9.3 points on average, compared to 199.8 ± 7.1 points for ChatGPT (p = 0.21).
Conclusions
Google Gemini and ChatGPT-4 can both succeed in a complex ophthalmology examination on widespread topics, with higher accuracy compared to their former versions, highlighting their evolving importance in educational and informative setting.
Precis
Google Gemini and ChatGPT-4 were both able to succeed in 3 consecutive exam simulations of the European Board of Ophthalmologists with an average of 85 % and 83 % correct answers, respectively. Google Gemini showed significantly less errors when compared to ChatGPT.