与大学生相比，ChatGPT 在医学微生物学考试中的成绩低于平均水平

IF 1.9 Q2 EDUCATION & EDUCATIONAL RESEARCH

Frontiers in Education Pub Date : 2023-12-21 DOI:10.3389/feduc.2023.1333415

Malik Sallam, Khaled Al-Salahat

{"title":"与大学生相比，ChatGPT 在医学微生物学考试中的成绩低于平均水平","authors":"Malik Sallam, Khaled Al-Salahat","doi":"10.3389/feduc.2023.1333415","DOIUrl":null,"url":null,"abstract":"The transformative potential of artificial intelligence (AI) in higher education is evident, with conversational models like ChatGPT poised to reshape teaching and assessment methods. The rapid evolution of AI models requires a continuous evaluation. AI-based models can offer personalized learning experiences but raises accuracy concerns. MCQs are widely used for competency assessment. The aim of this study was to evaluate ChatGPT performance in medical microbiology MCQs compared to the students’ performance.The study employed an 80-MCQ dataset from a 2021 medical microbiology exam at the University of Jordan Doctor of Dental Surgery (DDS) Medical Microbiology 2 course. The exam contained 40 midterm and 40 final MCQs, authored by a single instructor without copyright issues. The MCQs were categorized based on the revised Bloom’s Taxonomy into four categories: Remember, Understand, Analyze, or Evaluate. Metrics, including facility index and discriminative efficiency, were derived from 153 midterm and 154 final exam DDS student performances. ChatGPT 3.5 was used to answer questions, and responses were assessed for correctness and clarity by two independent raters.ChatGPT 3.5 correctly answered 64 out of 80 medical microbiology MCQs (80%) but scored below the student average (80.5/100 vs. 86.21/100). Incorrect ChatGPT responses were more common in MCQs with longer choices (p = 0.025). ChatGPT 3.5 performance varied across cognitive domains: Remember (88.5% correct), Understand (82.4% correct), Analyze (75% correct), Evaluate (72% correct), with no statistically significant differences (p = 0.492). Correct ChatGPT responses received statistically significant higher average clarity and correctness scores compared to incorrect responses.The study findings emphasized the need for ongoing refinement and evaluation of ChatGPT performance. ChatGPT 3.5 showed the potential to correctly and clearly answer medical microbiology MCQs; nevertheless, its performance was below-bar compared to the students. Variability in ChatGPT performance in different cognitive domains should be considered in future studies. The study insights could contribute to the ongoing evaluation of the AI-based models’ role in educational assessment and to augment the traditional methods in higher education.","PeriodicalId":52290,"journal":{"name":"Frontiers in Education","volume":"51 8","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2023-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Below average ChatGPT performance in medical microbiology exam compared to university students\",\"authors\":\"Malik Sallam, Khaled Al-Salahat\",\"doi\":\"10.3389/feduc.2023.1333415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The transformative potential of artificial intelligence (AI) in higher education is evident, with conversational models like ChatGPT poised to reshape teaching and assessment methods. The rapid evolution of AI models requires a continuous evaluation. AI-based models can offer personalized learning experiences but raises accuracy concerns. MCQs are widely used for competency assessment. The aim of this study was to evaluate ChatGPT performance in medical microbiology MCQs compared to the students’ performance.The study employed an 80-MCQ dataset from a 2021 medical microbiology exam at the University of Jordan Doctor of Dental Surgery (DDS) Medical Microbiology 2 course. The exam contained 40 midterm and 40 final MCQs, authored by a single instructor without copyright issues. The MCQs were categorized based on the revised Bloom’s Taxonomy into four categories: Remember, Understand, Analyze, or Evaluate. Metrics, including facility index and discriminative efficiency, were derived from 153 midterm and 154 final exam DDS student performances. ChatGPT 3.5 was used to answer questions, and responses were assessed for correctness and clarity by two independent raters.ChatGPT 3.5 correctly answered 64 out of 80 medical microbiology MCQs (80%) but scored below the student average (80.5/100 vs. 86.21/100). Incorrect ChatGPT responses were more common in MCQs with longer choices (p = 0.025). ChatGPT 3.5 performance varied across cognitive domains: Remember (88.5% correct), Understand (82.4% correct), Analyze (75% correct), Evaluate (72% correct), with no statistically significant differences (p = 0.492). Correct ChatGPT responses received statistically significant higher average clarity and correctness scores compared to incorrect responses.The study findings emphasized the need for ongoing refinement and evaluation of ChatGPT performance. ChatGPT 3.5 showed the potential to correctly and clearly answer medical microbiology MCQs; nevertheless, its performance was below-bar compared to the students. Variability in ChatGPT performance in different cognitive domains should be considered in future studies. The study insights could contribute to the ongoing evaluation of the AI-based models’ role in educational assessment and to augment the traditional methods in higher education.\",\"PeriodicalId\":52290,\"journal\":{\"name\":\"Frontiers in Education\",\"volume\":\"51 8\",\"pages\":\"\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2023-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/feduc.2023.1333415\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/feduc.2023.1333415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 1

摘要

人工智能（AI）在高等教育中的变革潜力显而易见，ChatGPT 等会话模式有望重塑教学和评估方法。人工智能模型的快速发展需要持续的评估。基于人工智能的模型可以提供个性化的学习体验，但也引发了对准确性的担忧。MCQs 被广泛用于能力评估。本研究的目的是评估 ChatGPT 在医学微生物学 MCQ 中的表现，并与学生的表现进行比较。研究采用了来自约旦大学牙科博士（DDS）医学微生物学 2 课程 2021 年医学微生物学考试的 80-MCQ 数据集。考试包含 40 个期中和 40 个期末 MCQ，由一名教师编写，无版权问题。MCQ 根据修订后的布鲁姆分类法分为四类：记忆、理解、分析或评价。从 153 个期中考试和 154 个期末考试 DDS 学生的表现中得出了包括设施指数和判别效率在内的指标。ChatGPT 3.5 用于回答问题，回答的正确性和清晰度由两名独立评分员进行评估。ChatGPT 3.5 正确回答了 80 道医学微生物学 MCQ 中的 64 道（80%），但得分低于学生平均水平（80.5/100 vs. 86.21/100）。错误的 ChatGPT 回答在选择较长的 MCQ 中更为常见（p = 0.025）。ChatGPT 3.5 在不同认知领域的表现各不相同：记忆（正确率为 88.5%）、理解（正确率为 82.4%）、分析（正确率为 75%）、评估（正确率为 72%），在统计上没有显著差异 (p = 0.492)。与不正确的回答相比，正确的 ChatGPT 回答在统计意义上获得了更高的平均清晰度和正确率分数。研究结果强调了对 ChatGPT 性能进行持续改进和评估的必要性。ChatGPT 3.5 显示了正确、清晰地回答医学微生物学 MCQ 的潜力；然而，与学生相比，它的表现低于标准。今后的研究应考虑到 ChatGPT 在不同认知领域的表现差异。这项研究的启示有助于不断评估基于人工智能的模型在教育评估中的作用，并在高等教育中增强传统方法的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Below average ChatGPT performance in medical microbiology exam compared to university students

The transformative potential of artificial intelligence (AI) in higher education is evident, with conversational models like ChatGPT poised to reshape teaching and assessment methods. The rapid evolution of AI models requires a continuous evaluation. AI-based models can offer personalized learning experiences but raises accuracy concerns. MCQs are widely used for competency assessment. The aim of this study was to evaluate ChatGPT performance in medical microbiology MCQs compared to the students’ performance.The study employed an 80-MCQ dataset from a 2021 medical microbiology exam at the University of Jordan Doctor of Dental Surgery (DDS) Medical Microbiology 2 course. The exam contained 40 midterm and 40 final MCQs, authored by a single instructor without copyright issues. The MCQs were categorized based on the revised Bloom’s Taxonomy into four categories: Remember, Understand, Analyze, or Evaluate. Metrics, including facility index and discriminative efficiency, were derived from 153 midterm and 154 final exam DDS student performances. ChatGPT 3.5 was used to answer questions, and responses were assessed for correctness and clarity by two independent raters.ChatGPT 3.5 correctly answered 64 out of 80 medical microbiology MCQs (80%) but scored below the student average (80.5/100 vs. 86.21/100). Incorrect ChatGPT responses were more common in MCQs with longer choices (p = 0.025). ChatGPT 3.5 performance varied across cognitive domains: Remember (88.5% correct), Understand (82.4% correct), Analyze (75% correct), Evaluate (72% correct), with no statistically significant differences (p = 0.492). Correct ChatGPT responses received statistically significant higher average clarity and correctness scores compared to incorrect responses.The study findings emphasized the need for ongoing refinement and evaluation of ChatGPT performance. ChatGPT 3.5 showed the potential to correctly and clearly answer medical microbiology MCQs; nevertheless, its performance was below-bar compared to the students. Variability in ChatGPT performance in different cognitive domains should be considered in future studies. The study insights could contribute to the ongoing evaluation of the AI-based models’ role in educational assessment and to augment the traditional methods in higher education.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊