Answering Patterns in SBA Items: Students, GPT3.5, and Gemini.

IF 1.9 Q2 EDUCATION, SCIENTIFIC DISCIPLINES

Medical Science Educator Pub Date : 2024-11-26 eCollection Date: 2025-04-01 DOI:10.1007/s40670-024-02232-4

Olivia Ng, Dong Haur Phua, Jowe Chu, Lucy V E Wilding, Sreenivasulu Reddy Mogali, Jennifer Cleland

{"title":"Answering Patterns in SBA Items: Students, GPT3.5, and Gemini.","authors":"Olivia Ng, Dong Haur Phua, Jowe Chu, Lucy V E Wilding, Sreenivasulu Reddy Mogali, Jennifer Cleland","doi":"10.1007/s40670-024-02232-4","DOIUrl":null,"url":null,"abstract":"<p><p>While large language models (LLMs) are often used to generate and answer exam questions, limited work compares their performance across multiple iterations using item statistics. This study aims to fill that gap by investigating answering patterns of how LLMs respond to single-best answer (SBA) questions, comparing their performance to that of students. Forty-one SBA questions for first-year medical students were assessed using the most easily assessable and free-to-use GPT3.5 and Gemini across 100 iterations. Both LLMs exhibited more repetitive and clustered answering patterns compared to students, which can be problematic as it may compound mistakes by repeating error selection. Distractor analysis revealed that students performed better when managing multiple options in the SBA format. We found that these free-to-use LLMs are inferior to well-trained students or specialists in handling technical questions. We have also highlighted concerns on LLMs' contextual interpretation of these items and the need of human oversight in the medical education assessment process.</p>","PeriodicalId":37113,"journal":{"name":"Medical Science Educator","volume":"35 2","pages":"629-632"},"PeriodicalIF":1.9000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12058614/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Science Educator","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40670-024-02232-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

While large language models (LLMs) are often used to generate and answer exam questions, limited work compares their performance across multiple iterations using item statistics. This study aims to fill that gap by investigating answering patterns of how LLMs respond to single-best answer (SBA) questions, comparing their performance to that of students. Forty-one SBA questions for first-year medical students were assessed using the most easily assessable and free-to-use GPT3.5 and Gemini across 100 iterations. Both LLMs exhibited more repetitive and clustered answering patterns compared to students, which can be problematic as it may compound mistakes by repeating error selection. Distractor analysis revealed that students performed better when managing multiple options in the SBA format. We found that these free-to-use LLMs are inferior to well-trained students or specialists in handling technical questions. We have also highlighted concerns on LLMs' contextual interpretation of these items and the need of human oversight in the medical education assessment process.

查看原文本刊更多论文

SBA项目的回答模式：学生，GPT3.5，双子座。

虽然大型语言模型（llm）经常用于生成和回答考试问题，但有限的工作是使用条目统计比较它们在多个迭代中的性能。本研究旨在通过调查法学硕士如何回答单最佳答案（SBA）问题的回答模式，将他们的表现与学生的表现进行比较，来填补这一空白。通过100次迭代，使用最容易评估且免费使用的GPT3.5和Gemini对一年级医学生的41个SBA问题进行了评估。与学生相比，法学硕士和法学硕士都表现出更多的重复和聚集的回答模式，这可能是有问题的，因为它可能会通过重复错误选择来加剧错误。干扰因素分析显示，学生在处理SBA格式的多个选项时表现更好。我们发现，在处理技术问题方面，这些免费使用的法学硕士不如训练有素的学生或专家。我们还强调了法学硕士对这些项目的上下文解释以及在医学教育评估过程中需要人为监督的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical Science Educator Social Sciences-Education

CiteScore

2.90

自引率

11.80%

发文量

202

期刊介绍： Medical Science Educator is the successor of the journal JIAMSE. It is the peer-reviewed publication of the International Association of Medical Science Educators (IAMSE). The Journal offers all who teach in healthcare the most current information to succeed in their task by publishing scholarly activities, opinions, and resources in medical science education. Published articles focus on teaching the sciences fundamental to modern medicine and health, and include basic science education, clinical teaching, and the use of modern education technologies. The Journal provides the readership a better understanding of teaching and learning techniques in order to advance medical science education.