Mehmet Cem Sabaner , Arzu Seyhan Karatepe Hashas , Kemal Mert Mutibayraktaroglu , Zubeyir Yozgat , Oliver Niels Klefter , Yousif Subhi
{"title":"基于人工智能的大型语言模型在瑞典语医学水平测试中眼科相关问题上的表现:ChatGPT-4 omni vs Gemini 1.5 Pro","authors":"Mehmet Cem Sabaner , Arzu Seyhan Karatepe Hashas , Kemal Mert Mutibayraktaroglu , Zubeyir Yozgat , Oliver Niels Klefter , Yousif Subhi","doi":"10.1016/j.ajoint.2024.100070","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To compare the interpretation and response context of two commonly used artificial intelligence (AI)-based large language model (LLM) platforms to ophthalmology-related multiple choice questions (MCQs) in the Swedish proficiency test for medicine (“<em>kunskapsprov för läkare</em>”) exams.</div></div><div><h3>Design</h3><div>Observational study.</div></div><div><h3>Methods</h3><div>The questions of a total of 29 exams held between 2016 and 2024 were reviewed. All ophthalmology-related questions were included in this study, and categorized into ophthalmology sections. Questions were asked to ChatGPT-4o and Gemini 1.5 Pro AI-based LLM chatbots in Swedish and English with specific commands. Secondly, all MCQs were asked again without feedback. As the final step, feedback was given for questions that were still answered incorrectly, and all questions were subsequently re-asked.</div></div><div><h3>Results</h3><div>A total of 134 ophthalmology-related questions out of 4876 MCQs were evaluated via both AI-based LLMs. The MCQ count in the 29 exams was 4.62 ± 2.21 (range: 0–8). After the final step, ChatGPT-4o achieved higher accuracy in Swedish (94 %) and English (95.5 %) compared to Gemini 1.5 Pro (both at 88.1 %) (<em>p</em> = <em>0.13</em>, and <em>p</em> = <em>0.04</em>, respectively). Moreover, ChatGPT-4o provided more correct answers in the neuro-ophthalmology section (<em>n</em> = 47) compared to Gemini 1.5 Pro across all three attempts in English (<em>p</em> <em><</em> <em>0.05</em>). There was no statistically significant difference either in the inter-AI comparison of other ophthalmology sections or in the inter-lingual comparison within AIs.</div></div><div><h3>Conclusion</h3><div>Both AI-based LLMs, and especially ChatGPT-4o, appear to perform well in ophthalmology-related MCQs. AI-based LLMs can contribute to ophthalmological medical education not only by selecting correct answers to MCQs but also by providing explanations.</div></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"1 4","pages":"Article 100070"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro\",\"authors\":\"Mehmet Cem Sabaner , Arzu Seyhan Karatepe Hashas , Kemal Mert Mutibayraktaroglu , Zubeyir Yozgat , Oliver Niels Klefter , Yousif Subhi\",\"doi\":\"10.1016/j.ajoint.2024.100070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>To compare the interpretation and response context of two commonly used artificial intelligence (AI)-based large language model (LLM) platforms to ophthalmology-related multiple choice questions (MCQs) in the Swedish proficiency test for medicine (“<em>kunskapsprov för läkare</em>”) exams.</div></div><div><h3>Design</h3><div>Observational study.</div></div><div><h3>Methods</h3><div>The questions of a total of 29 exams held between 2016 and 2024 were reviewed. All ophthalmology-related questions were included in this study, and categorized into ophthalmology sections. Questions were asked to ChatGPT-4o and Gemini 1.5 Pro AI-based LLM chatbots in Swedish and English with specific commands. Secondly, all MCQs were asked again without feedback. As the final step, feedback was given for questions that were still answered incorrectly, and all questions were subsequently re-asked.</div></div><div><h3>Results</h3><div>A total of 134 ophthalmology-related questions out of 4876 MCQs were evaluated via both AI-based LLMs. The MCQ count in the 29 exams was 4.62 ± 2.21 (range: 0–8). After the final step, ChatGPT-4o achieved higher accuracy in Swedish (94 %) and English (95.5 %) compared to Gemini 1.5 Pro (both at 88.1 %) (<em>p</em> = <em>0.13</em>, and <em>p</em> = <em>0.04</em>, respectively). Moreover, ChatGPT-4o provided more correct answers in the neuro-ophthalmology section (<em>n</em> = 47) compared to Gemini 1.5 Pro across all three attempts in English (<em>p</em> <em><</em> <em>0.05</em>). There was no statistically significant difference either in the inter-AI comparison of other ophthalmology sections or in the inter-lingual comparison within AIs.</div></div><div><h3>Conclusion</h3><div>Both AI-based LLMs, and especially ChatGPT-4o, appear to perform well in ophthalmology-related MCQs. AI-based LLMs can contribute to ophthalmological medical education not only by selecting correct answers to MCQs but also by providing explanations.</div></div>\",\"PeriodicalId\":100071,\"journal\":{\"name\":\"AJO International\",\"volume\":\"1 4\",\"pages\":\"Article 100070\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AJO International\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2950253524000704\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253524000704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The performance of artificial intelligence-based large language models on ophthalmology-related questions in Swedish proficiency test for medicine: ChatGPT-4 omni vs Gemini 1.5 Pro
Purpose
To compare the interpretation and response context of two commonly used artificial intelligence (AI)-based large language model (LLM) platforms to ophthalmology-related multiple choice questions (MCQs) in the Swedish proficiency test for medicine (“kunskapsprov för läkare”) exams.
Design
Observational study.
Methods
The questions of a total of 29 exams held between 2016 and 2024 were reviewed. All ophthalmology-related questions were included in this study, and categorized into ophthalmology sections. Questions were asked to ChatGPT-4o and Gemini 1.5 Pro AI-based LLM chatbots in Swedish and English with specific commands. Secondly, all MCQs were asked again without feedback. As the final step, feedback was given for questions that were still answered incorrectly, and all questions were subsequently re-asked.
Results
A total of 134 ophthalmology-related questions out of 4876 MCQs were evaluated via both AI-based LLMs. The MCQ count in the 29 exams was 4.62 ± 2.21 (range: 0–8). After the final step, ChatGPT-4o achieved higher accuracy in Swedish (94 %) and English (95.5 %) compared to Gemini 1.5 Pro (both at 88.1 %) (p = 0.13, and p = 0.04, respectively). Moreover, ChatGPT-4o provided more correct answers in the neuro-ophthalmology section (n = 47) compared to Gemini 1.5 Pro across all three attempts in English (p<0.05). There was no statistically significant difference either in the inter-AI comparison of other ophthalmology sections or in the inter-lingual comparison within AIs.
Conclusion
Both AI-based LLMs, and especially ChatGPT-4o, appear to perform well in ophthalmology-related MCQs. AI-based LLMs can contribute to ophthalmological medical education not only by selecting correct answers to MCQs but also by providing explanations.