ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS

ACS Applied Bio Materials Pub Date : 2024-05-01 Epub Date: 2024-02-14 DOI:10.1007/s00228-024-03649-x

Yavuz Selim Kıyak, Özlem Coşkun, Işıl İrem Budakoğlu, Canan Uluoğlu

{"title":"ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.","authors":"Yavuz Selim Kıyak, Özlem Coşkun, Işıl İrem Budakoğlu, Canan Uluoğlu","doi":"10.1007/s00228-024-03649-x","DOIUrl":null,"url":null,"abstract":"Purpose: Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels.Methods: This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options.Results: Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none.Conclusions: The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00228-024-03649-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/14 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels.

Methods: This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options.

Results: Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none.

Conclusions: The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.

查看原文本刊更多论文

用于生成选择题的 ChatGPT：在合理用药考试中使用人工智能自动生成题目的证据。

目的：人工智能，特别是大型语言模型，如 ChatGPT，为问题（题目）的编写提供了宝贵的潜在优势。本研究旨在确定使用 ChatGPT 生成基于病例的选择题在题目难度和区分度方面的可行性：本研究涉及 99 名四年级医学生，他们参加了基于世界卫生组织 6 步模式的合理药物治疗实习。根据我们提供的提示，ChatGPT 生成了 10 道关于高血压的案例选择题。经过专家小组讨论，其中两道选择题在不做任何改动的情况下被纳入医学院考试。根据测试的实施情况，我们对其心理测量特性进行了评估，包括题目难度、题目区分度（点-倍相关性）和选项的功能性：结果：两道试题的点-阶梯相关性都达到了可接受的水平，高于 0.30 的临界值（0.41 和 0.39）。然而，一道试题有三个非功能性选项（只有不到 5%的考试参与者选择了该选项），而另一道试题则没有：研究结果表明，试题能有效区分成绩好的学生和成绩差的学生，这也指出了 ChatGPT 作为人工智能工具在测试开发中的潜力。未来的研究可能会使用该提示生成题目，以便通过收集来自不同机构和环境的数据来提高结果的外部有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACS Applied Bio Materials Chemistry-Chemistry (all)

CiteScore

9.40

自引率

2.10%

发文量

464