Benjamin Shultz, Robert J. DiDomenico, Kristen Goliak, Jeffrey Mucksavage
{"title":"GPT-4对药学教育有效试题生成效果的探索性评价","authors":"Benjamin Shultz, Robert J. DiDomenico, Kristen Goliak, Jeffrey Mucksavage","doi":"10.1016/j.ajpe.2025.101405","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate the effectiveness of GPT-4 in generating valid multiple-choice exam items for assessing therapeutic knowledge in pharmacy education.</div></div><div><h3>Methods</h3><div>A custom GPT application was developed to create 60 case-based items from a pharmacotherapy textbook. Nine subject matter experts reviewed items for content validity, difficulty, and quality. Valid items were compiled into a 38-question exam administered to 46 fourth-year pharmacy students. Classical test theory and Rasch analysis were used to assess psychometric properties.</div></div><div><h3>Results</h3><div>Of 60 generated items, 38 met content validity requirements, with only 6 accepted without revisions. The exam demonstrated moderate reliability and correlated well with a prior cumulative therapeutics exam. Classical item analysis revealed that most items had acceptable point biserial correlations, though fewer than half fell within the recommended difficulty range. Rasch analysis indicated potential multidimensionality and suboptimal targeting of item difficulty to student ability.</div></div><div><h3>Conclusion</h3><div>GPT-4 offers a preliminary step toward generating exam content in pharmacy education but has clear limitations that require further investigation and validation. Substantial human oversight and psychometric evaluation are necessary to ensure clinical realism and appropriate difficulty. Future research with larger samples is needed to further validate the effectiveness of artificial intelligence in item generation for high-stakes assessments in pharmacy education.</div></div>","PeriodicalId":55530,"journal":{"name":"American Journal of Pharmaceutical Education","volume":"89 5","pages":"Article 101405"},"PeriodicalIF":3.8000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploratory Assessment of GPT-4′s Effectiveness in Generating Valid Exam Items in Pharmacy Education\",\"authors\":\"Benjamin Shultz, Robert J. DiDomenico, Kristen Goliak, Jeffrey Mucksavage\",\"doi\":\"10.1016/j.ajpe.2025.101405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><div>To evaluate the effectiveness of GPT-4 in generating valid multiple-choice exam items for assessing therapeutic knowledge in pharmacy education.</div></div><div><h3>Methods</h3><div>A custom GPT application was developed to create 60 case-based items from a pharmacotherapy textbook. Nine subject matter experts reviewed items for content validity, difficulty, and quality. Valid items were compiled into a 38-question exam administered to 46 fourth-year pharmacy students. Classical test theory and Rasch analysis were used to assess psychometric properties.</div></div><div><h3>Results</h3><div>Of 60 generated items, 38 met content validity requirements, with only 6 accepted without revisions. The exam demonstrated moderate reliability and correlated well with a prior cumulative therapeutics exam. Classical item analysis revealed that most items had acceptable point biserial correlations, though fewer than half fell within the recommended difficulty range. Rasch analysis indicated potential multidimensionality and suboptimal targeting of item difficulty to student ability.</div></div><div><h3>Conclusion</h3><div>GPT-4 offers a preliminary step toward generating exam content in pharmacy education but has clear limitations that require further investigation and validation. Substantial human oversight and psychometric evaluation are necessary to ensure clinical realism and appropriate difficulty. Future research with larger samples is needed to further validate the effectiveness of artificial intelligence in item generation for high-stakes assessments in pharmacy education.</div></div>\",\"PeriodicalId\":55530,\"journal\":{\"name\":\"American Journal of Pharmaceutical Education\",\"volume\":\"89 5\",\"pages\":\"Article 101405\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Pharmaceutical Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0002945925000506\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION, SCIENTIFIC DISCIPLINES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Pharmaceutical Education","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0002945925000506","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
Exploratory Assessment of GPT-4′s Effectiveness in Generating Valid Exam Items in Pharmacy Education
Objective
To evaluate the effectiveness of GPT-4 in generating valid multiple-choice exam items for assessing therapeutic knowledge in pharmacy education.
Methods
A custom GPT application was developed to create 60 case-based items from a pharmacotherapy textbook. Nine subject matter experts reviewed items for content validity, difficulty, and quality. Valid items were compiled into a 38-question exam administered to 46 fourth-year pharmacy students. Classical test theory and Rasch analysis were used to assess psychometric properties.
Results
Of 60 generated items, 38 met content validity requirements, with only 6 accepted without revisions. The exam demonstrated moderate reliability and correlated well with a prior cumulative therapeutics exam. Classical item analysis revealed that most items had acceptable point biserial correlations, though fewer than half fell within the recommended difficulty range. Rasch analysis indicated potential multidimensionality and suboptimal targeting of item difficulty to student ability.
Conclusion
GPT-4 offers a preliminary step toward generating exam content in pharmacy education but has clear limitations that require further investigation and validation. Substantial human oversight and psychometric evaluation are necessary to ensure clinical realism and appropriate difficulty. Future research with larger samples is needed to further validate the effectiveness of artificial intelligence in item generation for high-stakes assessments in pharmacy education.
期刊介绍:
The Journal accepts unsolicited manuscripts that have not been published and are not under consideration for publication elsewhere. The Journal only considers material related to pharmaceutical education for publication. Authors must prepare manuscripts to conform to the Journal style (Author Instructions). All manuscripts are subject to peer review and approval by the editor prior to acceptance for publication. Reviewers are assigned by the editor with the advice of the editorial board as needed. Manuscripts are submitted and processed online (Submit a Manuscript) using Editorial Manager, an online manuscript tracking system that facilitates communication between the editorial office, editor, associate editors, reviewers, and authors.
After a manuscript is accepted, it is scheduled for publication in an upcoming issue of the Journal. All manuscripts are formatted and copyedited, and returned to the author for review and approval of the changes. Approximately 2 weeks prior to publication, the author receives an electronic proof of the article for final review and approval. Authors are not assessed page charges for publication.