Exploratory Assessment of GPT-4′s Effectiveness in Generating Valid Exam Items in Pharmacy Education

IF 3.8 4区 教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES
Benjamin Shultz, Robert J. DiDomenico, Kristen Goliak, Jeffrey Mucksavage
{"title":"Exploratory Assessment of GPT-4′s Effectiveness in Generating Valid Exam Items in Pharmacy Education","authors":"Benjamin Shultz,&nbsp;Robert J. DiDomenico,&nbsp;Kristen Goliak,&nbsp;Jeffrey Mucksavage","doi":"10.1016/j.ajpe.2025.101405","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate the effectiveness of GPT-4 in generating valid multiple-choice exam items for assessing therapeutic knowledge in pharmacy education.</div></div><div><h3>Methods</h3><div>A custom GPT application was developed to create 60 case-based items from a pharmacotherapy textbook. Nine subject matter experts reviewed items for content validity, difficulty, and quality. Valid items were compiled into a 38-question exam administered to 46 fourth-year pharmacy students. Classical test theory and Rasch analysis were used to assess psychometric properties.</div></div><div><h3>Results</h3><div>Of 60 generated items, 38 met content validity requirements, with only 6 accepted without revisions. The exam demonstrated moderate reliability and correlated well with a prior cumulative therapeutics exam. Classical item analysis revealed that most items had acceptable point biserial correlations, though fewer than half fell within the recommended difficulty range. Rasch analysis indicated potential multidimensionality and suboptimal targeting of item difficulty to student ability.</div></div><div><h3>Conclusion</h3><div>GPT-4 offers a preliminary step toward generating exam content in pharmacy education but has clear limitations that require further investigation and validation. Substantial human oversight and psychometric evaluation are necessary to ensure clinical realism and appropriate difficulty. Future research with larger samples is needed to further validate the effectiveness of artificial intelligence in item generation for high-stakes assessments in pharmacy education.</div></div>","PeriodicalId":55530,"journal":{"name":"American Journal of Pharmaceutical Education","volume":"89 5","pages":"Article 101405"},"PeriodicalIF":3.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Pharmaceutical Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0002945925000506","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

To evaluate the effectiveness of GPT-4 in generating valid multiple-choice exam items for assessing therapeutic knowledge in pharmacy education.

Methods

A custom GPT application was developed to create 60 case-based items from a pharmacotherapy textbook. Nine subject matter experts reviewed items for content validity, difficulty, and quality. Valid items were compiled into a 38-question exam administered to 46 fourth-year pharmacy students. Classical test theory and Rasch analysis were used to assess psychometric properties.

Results

Of 60 generated items, 38 met content validity requirements, with only 6 accepted without revisions. The exam demonstrated moderate reliability and correlated well with a prior cumulative therapeutics exam. Classical item analysis revealed that most items had acceptable point biserial correlations, though fewer than half fell within the recommended difficulty range. Rasch analysis indicated potential multidimensionality and suboptimal targeting of item difficulty to student ability.

Conclusion

GPT-4 offers a preliminary step toward generating exam content in pharmacy education but has clear limitations that require further investigation and validation. Substantial human oversight and psychometric evaluation are necessary to ensure clinical realism and appropriate difficulty. Future research with larger samples is needed to further validate the effectiveness of artificial intelligence in item generation for high-stakes assessments in pharmacy education.
GPT-4对药学教育有效试题生成效果的探索性评价
目的评价GPT-4在药学教育治疗知识评估中生成有效选择题的效果。方法开发自定义GPT应用程序,从药物治疗教科书中创建60个基于案例的项目。九位主题专家审查了内容效度、难度和质量。有效项目被汇编成一个38题的测试,对46名四年级的药学学生进行了测试。采用经典测试理论和Rasch分析来评估心理测量特性。结果在生成的60个项目中,38个满足内容效度要求,只有6个项目未经修改被接受。该检查显示出中等的可靠性,并与先前的累积治疗检查有良好的相关性。经典道具分析显示,大多数道具具有可接受的点双序列相关性,尽管只有不到一半的道具处于推荐难度范围内。Rasch分析显示项目难度对学生能力的潜在多维性和次优目标。结论pt -4为药学教育考试内容的生成提供了初步的方法,但仍有明显的局限性,需要进一步的研究和验证。大量的人力监督和心理测量评估是必要的,以确保临床的真实性和适当的难度。未来需要更大样本的研究来进一步验证人工智能在药学教育高风险评估项目生成中的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.30
自引率
15.20%
发文量
114
期刊介绍: The Journal accepts unsolicited manuscripts that have not been published and are not under consideration for publication elsewhere. The Journal only considers material related to pharmaceutical education for publication. Authors must prepare manuscripts to conform to the Journal style (Author Instructions). All manuscripts are subject to peer review and approval by the editor prior to acceptance for publication. Reviewers are assigned by the editor with the advice of the editorial board as needed. Manuscripts are submitted and processed online (Submit a Manuscript) using Editorial Manager, an online manuscript tracking system that facilitates communication between the editorial office, editor, associate editors, reviewers, and authors. After a manuscript is accepted, it is scheduled for publication in an upcoming issue of the Journal. All manuscripts are formatted and copyedited, and returned to the author for review and approval of the changes. Approximately 2 weeks prior to publication, the author receives an electronic proof of the article for final review and approval. Authors are not assessed page charges for publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信