The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.

IF 1.6 Q2 MULTIDISCIPLINARY SCIENCES

BMC Research Notes Pub Date : 2024-09-03 DOI:10.1186/s13104-024-06920-7

Malik Sallam, Kholoud Al-Mahzoum, Rawan Ahmad Almutawaa, Jasmen Ahmad Alhashash, Retaj Abdullah Dashti, Danah Raed AlSafy, Reem Abdullah Almutairi, Muna Barakat

{"title":"The performance of OpenAI ChatGPT-4 and Google Gemini in virology multiple-choice questions: a comparative analysis of English and Arabic responses.","authors":"Malik Sallam, Kholoud Al-Mahzoum, Rawan Ahmad Almutawaa, Jasmen Ahmad Alhashash, Retaj Abdullah Dashti, Danah Raed AlSafy, Reem Abdullah Almutairi, Muna Barakat","doi":"10.1186/s13104-024-06920-7","DOIUrl":null,"url":null,"abstract":"Objective: The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare.Results: ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.","PeriodicalId":9234,"journal":{"name":"BMC Research Notes","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11373487/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Research Notes","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13104-024-06920-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: The integration of artificial intelligence (AI) in healthcare education is inevitable. Understanding the proficiency of generative AI in different languages to answer complex questions is crucial for educational purposes. The study objective was to compare the performance ChatGPT-4 and Gemini in answering Virology multiple-choice questions (MCQs) in English and Arabic, while assessing the quality of the generated content. Both AI models' responses to 40 Virology MCQs were assessed for correctness and quality based on the CLEAR tool designed for evaluation of AI-generated content. The MCQs were classified into lower and higher cognitive categories based on the revised Bloom's taxonomy. The study design considered the METRICS checklist for the design and reporting of generative AI-based studies in healthcare.

Results: ChatGPT-4 and Gemini performed better in English compared to Arabic, with ChatGPT-4 consistently surpassing Gemini in correctness and CLEAR scores. ChatGPT-4 led Gemini with 80% vs. 62.5% correctness in English compared to 65% vs. 55% in Arabic. For both AI models, superior performance in lower cognitive domains was reported. Both ChatGPT-4 and Gemini exhibited potential in educational applications; nevertheless, their performance varied across languages highlighting the importance of continued development to ensure the effective AI integration in healthcare education globally.

查看原文本刊更多论文

OpenAI ChatGPT-4 和 Google Gemini 在病毒学多选题中的表现：英语和阿拉伯语回答的比较分析。

目的：人工智能（AI）融入医疗保健教育是不可避免的。了解生成式人工智能在不同语言中回答复杂问题的能力对于教育目的至关重要。本研究旨在比较 ChatGPT-4 和 Gemini 在用英语和阿拉伯语回答病毒学多选题（MCQ）时的表现，同时评估生成内容的质量。根据为评估人工智能生成内容而设计的 CLEAR 工具，对两个人工智能模型回答 40 道病毒学 MCQ 的正确性和质量进行了评估。根据修订后的布卢姆分类法，MCQ 被分为较低和较高的认知类别。研究设计考虑了 METRICS 检查表，用于设计和报告基于生成式人工智能的医疗保健研究：结果：与阿拉伯语相比，ChatGPT-4 和 Gemini 在英语中的表现更好，ChatGPT-4 的正确率和 CLEAR 分数一直超过 Gemini。ChatGPT-4 在英语中的正确率为 80% 对 62.5%，在阿拉伯语中为 65% 对 55%，领先于 Gemini。据报道，这两种人工智能模型在较低的认知领域都有出色的表现。ChatGPT-4 和 Gemini 在教育应用中都表现出了潜力；然而，它们在不同语言中的表现各不相同，这凸显了持续开发以确保将人工智能有效融入全球医疗保健教育的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Research Notes Biochemistry, Genetics and Molecular Biology-Biochemistry, Genetics and Molecular Biology (all)

CiteScore

3.60

自引率

0.00%

发文量

363

审稿时长

15 weeks

期刊介绍： BMC Research Notes publishes scientifically valid research outputs that cannot be considered as full research or methodology articles. We support the research community across all scientific and clinical disciplines by providing an open access forum for sharing data and useful information; this includes, but is not limited to, updates to previous work, additions to established methods, short publications, null results, research proposals and data management plans.