Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5.

IF 3.3 2区 教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES
Medical Teacher Pub Date : 2025-10-01 Epub Date: 2025-03-27 DOI:10.1080/0142159X.2025.2478872
Alan Balu, Stefan T Prvulovic, Claudia Fernandez Perez, Alexander Kim, Daniel A Donoho, Gregory Keating
{"title":"Evaluating the value of AI-generated questions for USMLE step 1 preparation: A study using ChatGPT-3.5.","authors":"Alan Balu, Stefan T Prvulovic, Claudia Fernandez Perez, Alexander Kim, Daniel A Donoho, Gregory Keating","doi":"10.1080/0142159X.2025.2478872","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Students are increasingly relying on artificial intelligence (AI) for medical education and exam preparation. However, the factual accuracy and content distribution of AI-generated exam questions for self-assessment have not been systematically investigated.</p><p><strong>Methods: </strong>Curated prompts were created to generate multiple-choice questions matching the USMLE Step 1 examination style. We utilized ChatGPT-3.5 to generate 50 questions and answers based upon each prompt style. We manually examined output for factual accuracy, Bloom's Taxonomy, and category within the USMLE Step 1 content outline.</p><p><strong>Results: </strong>ChatGPT-3.5 generated 150 multiple-choice case-style questions and selected an answer. Overall, 83% of generated multiple questions had no factual inaccuracies and 15% contained one to two factual inaccuracies. With simple prompting, common themes included deep venous thrombosis, myocardial infarction, and thyroid disease. Topic diversity improved by separating content topic generation from question generation, and specificity to Step 1 increased by indicating that \"treatment\" questions were not desired.</p><p><strong>Conclusion: </strong>We demonstrate that ChatGPT-3.5 can successfully generate Step 1 style questions with reasonable factual accuracy, and this method may be used by medical students preparing for USMLE examinations. While AI-generated questions demonstrated adequate factual accuracy, targeted prompting techniques should be used to overcome ChatGPT's bias towards particular medical conditions.</p>","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"1645-1653"},"PeriodicalIF":3.3000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2025.2478872","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Students are increasingly relying on artificial intelligence (AI) for medical education and exam preparation. However, the factual accuracy and content distribution of AI-generated exam questions for self-assessment have not been systematically investigated.

Methods: Curated prompts were created to generate multiple-choice questions matching the USMLE Step 1 examination style. We utilized ChatGPT-3.5 to generate 50 questions and answers based upon each prompt style. We manually examined output for factual accuracy, Bloom's Taxonomy, and category within the USMLE Step 1 content outline.

Results: ChatGPT-3.5 generated 150 multiple-choice case-style questions and selected an answer. Overall, 83% of generated multiple questions had no factual inaccuracies and 15% contained one to two factual inaccuracies. With simple prompting, common themes included deep venous thrombosis, myocardial infarction, and thyroid disease. Topic diversity improved by separating content topic generation from question generation, and specificity to Step 1 increased by indicating that "treatment" questions were not desired.

Conclusion: We demonstrate that ChatGPT-3.5 can successfully generate Step 1 style questions with reasonable factual accuracy, and this method may be used by medical students preparing for USMLE examinations. While AI-generated questions demonstrated adequate factual accuracy, targeted prompting techniques should be used to overcome ChatGPT's bias towards particular medical conditions.

评估人工智能生成的问题对USMLE第一步准备的价值:使用ChatGPT-3.5的研究。
目的:学生越来越依赖人工智能(AI)进行医学教育和考试准备。然而,人工智能自评试题的事实准确性和内容分布尚未得到系统的研究。方法:创建策划提示,生成与USMLE第一步考试风格相匹配的多项选择题。我们利用ChatGPT-3.5根据每种提示风格生成50个问题和答案。我们手动检查了输出的事实准确性、Bloom分类法和USMLE第1步内容大纲中的类别。结果:ChatGPT-3.5生成150道案例式选择题,并选出一个答案。总体而言,83%的生成的多个问题没有事实不准确,15%包含一到两个事实不准确。在简单提示下,常见的主题包括深静脉血栓形成、心肌梗死和甲状腺疾病。通过将内容主题生成与问题生成分离,提高了主题多样性,通过指出不需要“治疗”问题,提高了对步骤1的特异性。结论:我们证明ChatGPT-3.5可以成功生成具有合理事实准确性的第一步风格问题,该方法可用于准备USMLE考试的医学生。虽然人工智能生成的问题显示出足够的事实准确性,但应该使用有针对性的提示技术来克服ChatGPT对特定医疗条件的偏见。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Medical Teacher
Medical Teacher 医学-卫生保健
CiteScore
7.80
自引率
8.50%
发文量
396
审稿时长
3-6 weeks
期刊介绍: Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信