A comparative analysis of the performance of chatGPT4, Gemini Gemini and Claude Claude for the Polish Medical Final Diploma Exam and Medical-Dental Verification Exam.

medRxiv Pub Date : 2024-07-29 DOI:10.1101/2024.07.29.24311077

D. Wojcik, O. Adamiak, G. Czerepak, O. Tokarczuk, L. Szalewski

{"title":"A comparative analysis of the performance of chatGPT4, Gemini Gemini and Claude Claude for the Polish Medical Final Diploma Exam and Medical-Dental Verification Exam.","authors":"D. Wojcik, O. Adamiak, G. Czerepak, O. Tokarczuk, L. Szalewski","doi":"10.1101/2024.07.29.24311077","DOIUrl":null,"url":null,"abstract":"In the realm of medical education, the utility of chatbots is being explored with growing interest. One pertinent area of investigation is the performance of these models on standardized medical examinations, which are crucial for certifying the knowledge and readiness of healthcare professionals. In Poland, dental and medical students have to pass crucial exams known as LDEK (Medical-Dental Final Examination) and LEK (Medical Final Examination) exams respectively. The primary objective of this study was to conduct a comparative analysis of chatbots: ChatGPT-4, Gemini and Claude to evaluate their accuracy in answering exam questions of the LDEK and the Medical-Dental Verification Examination (LDEW), using queries in both English and Polish. The analysis of Model 2, which compared chatbots within question groups, showed that the chatbot Claude achieved the highest probability of accuracy for all question groups except the area of prosthetic dentistry compared to ChatGPT-4 and Gemini. In addition, the probability of a correct answer to questions in the field of integrated medicine is higher than in the field of dentistry for all chatbots in both prompt languages. Our results demonstrate that Claude achieved the highest accuracy in all areas analysed and outperformed other chatbots. This suggests that Claude has significant potential to support the medical education of dental students. This study showed that the performance of chatbots varied depending on the prompt language and the specific field. This highlights the importance of considering language and specialty when selecting a chatbot for educational purposes.","PeriodicalId":506788,"journal":{"name":"medRxiv","volume":"12 11","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.29.24311077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the realm of medical education, the utility of chatbots is being explored with growing interest. One pertinent area of investigation is the performance of these models on standardized medical examinations, which are crucial for certifying the knowledge and readiness of healthcare professionals. In Poland, dental and medical students have to pass crucial exams known as LDEK (Medical-Dental Final Examination) and LEK (Medical Final Examination) exams respectively. The primary objective of this study was to conduct a comparative analysis of chatbots: ChatGPT-4, Gemini and Claude to evaluate their accuracy in answering exam questions of the LDEK and the Medical-Dental Verification Examination (LDEW), using queries in both English and Polish. The analysis of Model 2, which compared chatbots within question groups, showed that the chatbot Claude achieved the highest probability of accuracy for all question groups except the area of prosthetic dentistry compared to ChatGPT-4 and Gemini. In addition, the probability of a correct answer to questions in the field of integrated medicine is higher than in the field of dentistry for all chatbots in both prompt languages. Our results demonstrate that Claude achieved the highest accuracy in all areas analysed and outperformed other chatbots. This suggests that Claude has significant potential to support the medical education of dental students. This study showed that the performance of chatbots varied depending on the prompt language and the specific field. This highlights the importance of considering language and specialty when selecting a chatbot for educational purposes.

查看原文本刊更多论文

对 chatGPT4、Gemini Gemini 和 Claude Claude 在波兰医学最终文凭考试和医学牙科验证考试中的表现进行比较分析。

在医学教育领域，人们对聊天机器人的实用性越来越感兴趣。其中一个相关的研究领域是这些模型在标准化医学考试中的表现，这对认证医疗保健专业人员的知识和准备程度至关重要。在波兰，牙科和医科学生必须通过分别被称为 LDEK（医学-牙科期末考试）和 LEK（医学期末考试）的重要考试。本研究的主要目的是对聊天机器人进行比较分析：ChatGPT-4、Gemini 和 Claude，使用英语和波兰语查询，评估它们在回答 LDEK 和医学-牙科验证考试（LDEW）试题时的准确性。模型 2 对问题组内的聊天机器人进行了比较，分析结果显示，与 ChatGPT-4 和 Gemini 相比，聊天机器人 Claude 在除修复牙科领域以外的所有问题组中都达到了最高的正确概率。此外，对于两种提示语言的所有聊天机器人来说，综合医学领域问题的正确答案概率都高于牙科领域。我们的结果表明，克劳德在所有分析领域都达到了最高的准确率，并优于其他聊天机器人。这表明 Claude 在支持牙科学生的医学教育方面具有巨大潜力。这项研究表明，聊天机器人的性能因提示语言和具体领域的不同而不同。这凸显了在为教育目的选择聊天机器人时考虑语言和专业的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv

自引率

0.00%

发文量