Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy

IF 1.8 Q2 EDUCATION, SCIENTIFIC DISCIPLINES

Advances in Medical Education and Practice Pub Date : 2024-05-10 DOI:10.2147/amep.s457408

Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi, Michael H Campbell, Kandamaran Krishnamurthy, Rhaheem NA Layne-Yarde, Alok Kumar, Dale CR Springer, Kenneth L Connell, Md Anwarul Azim Majumder

{"title":"Comparing the Performance of ChatGPT-4 and Medical Students on MCQs at Varied Levels of Bloom’s Taxonomy","authors":"Ambadasu Bharatha, Nkemcho Ojeh, Ahbab Mohammad Fazle Rabbi, Michael H Campbell, Kandamaran Krishnamurthy, Rhaheem NA Layne-Yarde, Alok Kumar, Dale CR Springer, Kenneth L Connell, Md Anwarul Azim Majumder","doi":"10.2147/amep.s457408","DOIUrl":null,"url":null,"abstract":"Introduction: This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom’s Taxonomy as a benchmark. Methods: A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing. Results: The study included 304 MCQs. Students demonstrated good knowledge, with 78% correctly answering at least 90% of the questions. However, ChatGPT-4 achieved a higher overall score (73.7%) compared to students (66.7%). Course type significantly affected ChatGPT-4’s performance, but revised Bloom’s Taxonomy levels did not. A detailed association check between program levels and Bloom’s taxonomy levels for correct answers by ChatGPT-4 showed a highly significant correlation (p< 0.001), reflecting a concentration of “remember-level” questions in preclinical and “evaluate-level” questions in clinical courses. Discussion: The study highlights ChatGPT-4’s proficiency in standardized tests but indicates limitations in clinical reasoning and practical skills. This performance discrepancy suggests that the effectiveness of artificial intelligence (AI) varies based on course content. Conclusion: While ChatGPT-4 shows promise as an educational tool, its role should be supplementary, with strategic integration into medical education to leverage its strengths and address limitations. Further research is needed to explore AI’s impact on medical education and student performance across educational levels and courses. Keywords: artificial intelligence, ChatGPT-4’s, medical students, knowledge, interpretation abilities, multiple choice questions ","PeriodicalId":47404,"journal":{"name":"Advances in Medical Education and Practice","volume":"3 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Medical Education and Practice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2147/amep.s457408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: This research investigated the capabilities of ChatGPT-4 compared to medical students in answering MCQs using the revised Bloom’s Taxonomy as a benchmark.
Methods: A cross-sectional study was conducted at The University of the West Indies, Barbados. ChatGPT-4 and medical students were assessed on MCQs from various medical courses using computer-based testing.
Results: The study included 304 MCQs. Students demonstrated good knowledge, with 78% correctly answering at least 90% of the questions. However, ChatGPT-4 achieved a higher overall score (73.7%) compared to students (66.7%). Course type significantly affected ChatGPT-4’s performance, but revised Bloom’s Taxonomy levels did not. A detailed association check between program levels and Bloom’s taxonomy levels for correct answers by ChatGPT-4 showed a highly significant correlation (p< 0.001), reflecting a concentration of “remember-level” questions in preclinical and “evaluate-level” questions in clinical courses.
Discussion: The study highlights ChatGPT-4’s proficiency in standardized tests but indicates limitations in clinical reasoning and practical skills. This performance discrepancy suggests that the effectiveness of artificial intelligence (AI) varies based on course content.
Conclusion: While ChatGPT-4 shows promise as an educational tool, its role should be supplementary, with strategic integration into medical education to leverage its strengths and address limitations. Further research is needed to explore AI’s impact on medical education and student performance across educational levels and courses.

Keywords: artificial intelligence, ChatGPT-4’s, medical students, knowledge, interpretation abilities, multiple choice questions

查看原文本刊更多论文

比较 ChatGPT-4 和医科学生在布卢姆分类学不同层次 MCQ 上的表现

简介本研究以修订版布鲁姆分类法为基准，调查了 ChatGPT-4 与医科学生相比在回答 MCQ 方面的能力：方法：在巴巴多斯西印度群岛大学进行了一项横向研究。结果：研究包括 304 道 MCQ：研究包括 304 道 MCQ。学生们表现出了良好的知识水平，78% 的学生至少正确回答了 90% 的问题。然而，与学生（66.7%）相比，ChatGPT-4 的总分（73.7%）更高。课程类型对 ChatGPT-4 的成绩有明显影响，但修订后的布鲁姆分类法水平没有影响。对 ChatGPT-4 正确答案的课程级别和布卢姆分类学级别进行的详细关联检查显示出高度显著的相关性（p< 0.001），反映出临床前课程中 "记忆级 "问题和临床课程中 "评价级 "问题的集中性：本研究强调了 ChatGPT-4 在标准化测试中的熟练程度，但表明其在临床推理和实践技能方面存在局限性。这种性能差异表明，人工智能（AI）的有效性因课程内容而异：结论：虽然 ChatGPT-4 显示出作为教育工具的前景，但其作用应是辅助性的，应战略性地融入医学教育，以发挥其优势并解决局限性。需要进一步研究探讨人工智能对医学教育的影响以及不同教育水平和课程的学生表现。关键词：人工智能；ChatGPT-4；医学生；知识；解释能力；选择题

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊