Use of AI (GPT-4)-generated multiple-choice questions for the examination of surgical subspecialty residents: Report of feasibility and psychometric analysis.

IF 1.9 4区 医学 Q3 UROLOGY & NEPHROLOGY
Jin Kyu Kim, Michael Chua, Armando Lorenzo, Mandy Rickard, Laura Andreacchi, Michael Kim, Douglas Cheung, Yonah Krakowsky, Jason Y Lee
{"title":"Use of AI (GPT-4)-generated multiple-choice questions for the examination of surgical subspecialty residents: Report of feasibility and psychometric analysis.","authors":"Jin Kyu Kim, Michael Chua, Armando Lorenzo, Mandy Rickard, Laura Andreacchi, Michael Kim, Douglas Cheung, Yonah Krakowsky, Jason Y Lee","doi":"10.5489/cuaj.9020","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Multiple-choice questions (MCQs) are essential in medical education and widely used by licensing bodies. They are traditionally created with intensive human effort to ensure validity. Recent advances in AI, particularly large language models (LLMs), offer the potential to streamline this process. This study aimed to develop and test a GPT-4 model with customized instructions for generating MCQs to assess urology residents.</p><p><strong>Methods: </strong>A GPT-4 model was embedded using guidelines from medical licensing bodies and reference materials specific to urology. This model was tasked with generating MCQs designed to mimic the format and content of the 2023 urology examination outlined by the Royal College of Physicians and Surgeons of Canada (RCPSC). Following generation, a selection of MCQs underwent expert review for validity and suitability.</p><p><strong>Results: </strong>From an initial set of 123 generated MCQs, 60 were chosen for inclusion in an exam administered to 15 urology residents at the University of Toronto. The exam results demonstrated a general increasing performance with level of training cohorts, suggesting the MCQs' ability to effectively discriminate knowledge levels among residents. The majority (33/60) of the questions had discriminatory value that appeared acceptable (discriminatory index 0.2-0.4) or excellent (discriminatory index >0.4).</p><p><strong>Conclusions: </strong>This study highlights AI-driven models like GPT-4 as efficient tools to aid with MCQ generation in medical education assessments. By automating MCQ creation while maintaining quality standards, AI can expedite processes. Future research should focus on refining AI applications in education to optimize assessments and enhance medical training and certification outcomes.</p>","PeriodicalId":50613,"journal":{"name":"Cuaj-Canadian Urological Association Journal","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cuaj-Canadian Urological Association Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5489/cuaj.9020","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Multiple-choice questions (MCQs) are essential in medical education and widely used by licensing bodies. They are traditionally created with intensive human effort to ensure validity. Recent advances in AI, particularly large language models (LLMs), offer the potential to streamline this process. This study aimed to develop and test a GPT-4 model with customized instructions for generating MCQs to assess urology residents.

Methods: A GPT-4 model was embedded using guidelines from medical licensing bodies and reference materials specific to urology. This model was tasked with generating MCQs designed to mimic the format and content of the 2023 urology examination outlined by the Royal College of Physicians and Surgeons of Canada (RCPSC). Following generation, a selection of MCQs underwent expert review for validity and suitability.

Results: From an initial set of 123 generated MCQs, 60 were chosen for inclusion in an exam administered to 15 urology residents at the University of Toronto. The exam results demonstrated a general increasing performance with level of training cohorts, suggesting the MCQs' ability to effectively discriminate knowledge levels among residents. The majority (33/60) of the questions had discriminatory value that appeared acceptable (discriminatory index 0.2-0.4) or excellent (discriminatory index >0.4).

Conclusions: This study highlights AI-driven models like GPT-4 as efficient tools to aid with MCQ generation in medical education assessments. By automating MCQ creation while maintaining quality standards, AI can expedite processes. Future research should focus on refining AI applications in education to optimize assessments and enhance medical training and certification outcomes.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Cuaj-Canadian Urological Association Journal
Cuaj-Canadian Urological Association Journal 医学-泌尿学与肾脏学
CiteScore
2.80
自引率
10.50%
发文量
167
审稿时长
>12 weeks
期刊介绍: CUAJ is a a peer-reviewed, open-access journal devoted to promoting the highest standard of urological patient care through the publication of timely, relevant, evidence-based research and advocacy information.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信