ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.

IF 2.4 3区 医学 Q3 PHARMACOLOGY & PHARMACY
Yavuz Selim Kıyak, Özlem Coşkun, Işıl İrem Budakoğlu, Canan Uluoğlu
{"title":"ChatGPT for generating multiple-choice questions: Evidence on the use of artificial intelligence in automatic item generation for a rational pharmacotherapy exam.","authors":"Yavuz Selim Kıyak, Özlem Coşkun, Işıl İrem Budakoğlu, Canan Uluoğlu","doi":"10.1007/s00228-024-03649-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels.</p><p><strong>Methods: </strong>This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options.</p><p><strong>Results: </strong>Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none.</p><p><strong>Conclusions: </strong>The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.</p>","PeriodicalId":11857,"journal":{"name":"European Journal of Clinical Pharmacology","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Clinical Pharmacology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00228-024-03649-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/14 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Artificial intelligence, specifically large language models such as ChatGPT, offers valuable potential benefits in question (item) writing. This study aimed to determine the feasibility of generating case-based multiple-choice questions using ChatGPT in terms of item difficulty and discrimination levels.

Methods: This study involved 99 fourth-year medical students who participated in a rational pharmacotherapy clerkship carried out based-on the WHO 6-Step Model. In response to a prompt that we provided, ChatGPT generated ten case-based multiple-choice questions on hypertension. Following an expert panel, two of these multiple-choice questions were incorporated into a medical school exam without making any changes in the questions. Based on the administration of the test, we evaluated their psychometric properties, including item difficulty, item discrimination (point-biserial correlation), and functionality of the options.

Results: Both questions exhibited acceptable levels of point-biserial correlation, which is higher than the threshold of 0.30 (0.41 and 0.39). However, one question had three non-functional options (options chosen by fewer than 5% of the exam participants) while the other question had none.

Conclusions: The findings showed that the questions can effectively differentiate between students who perform at high and low levels, which also point out the potential of ChatGPT as an artificial intelligence tool in test development. Future studies may use the prompt to generate items in order for enhancing the external validity of the results by gathering data from diverse institutions and settings.

用于生成选择题的 ChatGPT:在合理用药考试中使用人工智能自动生成题目的证据。
目的:人工智能,特别是大型语言模型,如 ChatGPT,为问题(题目)的编写提供了宝贵的潜在优势。本研究旨在确定使用 ChatGPT 生成基于病例的选择题在题目难度和区分度方面的可行性:本研究涉及 99 名四年级医学生,他们参加了基于世界卫生组织 6 步模式的合理药物治疗实习。根据我们提供的提示,ChatGPT 生成了 10 道关于高血压的案例选择题。经过专家小组讨论,其中两道选择题在不做任何改动的情况下被纳入医学院考试。根据测试的实施情况,我们对其心理测量特性进行了评估,包括题目难度、题目区分度(点-倍相关性)和选项的功能性:结果:两道试题的点-阶梯相关性都达到了可接受的水平,高于 0.30 的临界值(0.41 和 0.39)。然而,一道试题有三个非功能性选项(只有不到 5%的考试参与者选择了该选项),而另一道试题则没有:研究结果表明,试题能有效区分成绩好的学生和成绩差的学生,这也指出了 ChatGPT 作为人工智能工具在测试开发中的潜力。未来的研究可能会使用该提示生成题目,以便通过收集来自不同机构和环境的数据来提高结果的外部有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.40
自引率
3.40%
发文量
170
审稿时长
3-8 weeks
期刊介绍: The European Journal of Clinical Pharmacology publishes original papers on all aspects of clinical pharmacology and drug therapy in humans. Manuscripts are welcomed on the following topics: therapeutic trials, pharmacokinetics/pharmacodynamics, pharmacogenetics, drug metabolism, adverse drug reactions, drug interactions, all aspects of drug development, development relating to teaching in clinical pharmacology, pharmacoepidemiology, and matters relating to the rational prescribing and safe use of drugs. Methodological contributions relevant to these topics are also welcomed. Data from animal experiments are accepted only in the context of original data in man reported in the same paper. EJCP will only consider manuscripts describing the frequency of allelic variants in different populations if this information is linked to functional data or new interesting variants. Highly relevant differences in frequency with a major impact in drug therapy for the respective population may be submitted as a letter to the editor. Straightforward phase I pharmacokinetic or pharmacodynamic studies as parts of new drug development will only be considered for publication if the paper involves -a compound that is interesting and new in some basic or fundamental way, or -methods that are original in some basic sense, or -a highly unexpected outcome, or -conclusions that are scientifically novel in some basic or fundamental sense.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信