QUEST-AI: A System for Question Generation, Verification, and Refinement using AI for USMLE-Style Exams.

Q2 Computer Science

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Pub Date : 2025-01-01 DOI:10.1142/9789819807024_0005

Suhana Bedi, Scott L Fleming, Chia-Chun Chiang, Keith Morse, Aswathi Kumar, Birju Patel, Jenelle A Jindal, Conor Davenport, Craig Yamaguchi, Nigam H Shah

{"title":"QUEST-AI: A System for Question Generation, Verification, and Refinement using AI for USMLE-Style Exams.","authors":"Suhana Bedi, Scott L Fleming, Chia-Chun Chiang, Keith Morse, Aswathi Kumar, Birju Patel, Jenelle A Jindal, Conor Davenport, Craig Yamaguchi, Nigam H Shah","doi":"10.1142/9789819807024_0005","DOIUrl":null,"url":null,"abstract":"<p><p>The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet the process of creating exam questions and study materials is both time-consuming and costly. While Large Language Models (LLMs), such as OpenAI's GPT-4, have demonstrated proficiency in answering medical exam questions, their potential in generating such questions remains underexplored. This study presents QUEST-AI, a novel system that utilizes LLMs to (1) generate USMLE-style questions, (2) identify and flag incorrect questions, and (3) correct errors in the flagged questions. We evaluated this system's output by constructing a test set of 50 LLM-generated questions mixed with 50 human-generated questions and conducting a two-part assessment with three physicians and two medical students. The assessors attempted to distinguish between LLM and human-generated questions and evaluated the validity of the LLM-generated content. A majority of exam questions generated by QUEST-AI were deemed valid by a panel of three clinicians, with strong correlations between performance on LLM-generated and human-generated questions. This pioneering application of LLMs in medical education could significantly increase the ease and efficiency of developing USMLE-style medical exam content, offering a cost-effective and accessible alternative for exam preparation.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"30 ","pages":"54-69"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9789819807024_0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

The United States Medical Licensing Examination (USMLE) is a critical step in assessing the competence of future physicians, yet the process of creating exam questions and study materials is both time-consuming and costly. While Large Language Models (LLMs), such as OpenAI's GPT-4, have demonstrated proficiency in answering medical exam questions, their potential in generating such questions remains underexplored. This study presents QUEST-AI, a novel system that utilizes LLMs to (1) generate USMLE-style questions, (2) identify and flag incorrect questions, and (3) correct errors in the flagged questions. We evaluated this system's output by constructing a test set of 50 LLM-generated questions mixed with 50 human-generated questions and conducting a two-part assessment with three physicians and two medical students. The assessors attempted to distinguish between LLM and human-generated questions and evaluated the validity of the LLM-generated content. A majority of exam questions generated by QUEST-AI were deemed valid by a panel of three clinicians, with strong correlations between performance on LLM-generated and human-generated questions. This pioneering application of LLMs in medical education could significantly increase the ease and efficiency of developing USMLE-style medical exam content, offering a cost-effective and accessible alternative for exam preparation.

查看原文本刊更多论文

QUEST-AI：使用人工智能生成、验证和改进 USMLE 考试试题的系统。

美国医师执照考试（USMLE）是评估未来医师能力的关键一步，然而编制考试试题和学习材料的过程既耗时又昂贵。虽然大型语言模型（LLMs），如 OpenAI 的 GPT-4，已经证明能够熟练回答医学考试问题，但它们在生成此类问题方面的潜力仍未得到充分挖掘。本研究介绍了 QUEST-AI，这是一个利用 LLM 生成以下内容的新型系统：(1) 生成 USMLE 类型的问题；(2) 识别并标记错误问题；(3) 纠正标记问题中的错误。我们构建了一个测试集，其中包括 50 道由 LLM 生成的试题和 50 道由人工生成的试题，并对三名医生和两名医科学生进行了由两部分组成的评估，以此来评估该系统的输出结果。评估人员试图区分 LLM 和人工生成的试题，并评估 LLM 生成内容的有效性。由三位临床医生组成的小组认为，QUEST-AI 生成的大多数试题都是有效的，LLM 生成的试题和人工生成的试题的成绩之间存在很强的相关性。LLM 在医学教育中的这一开创性应用可大大提高开发 USMLE 式医学考试内容的难度和效率，为备考提供了一种经济高效且易于使用的替代方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Medicine-Medicine (all)

CiteScore

4.50

自引率

0.00%

发文量