Large language models for generating medical examinations: systematic review

medRxiv - Medical Education Pub Date : 2024-01-09 DOI:10.1101/2024.01.06.24300920

Yaara R Artsi, Vera Sorin, Eli Konen, Benjamin S Glicksberg, Girish Nadkarni, Eyal Klang

{"title":"Large language models for generating medical examinations: systematic review","authors":"Yaara R Artsi, Vera Sorin, Eli Konen, Benjamin S Glicksberg, Girish Nadkarni, Eyal Klang","doi":"10.1101/2024.01.06.24300920","DOIUrl":null,"url":null,"abstract":"Purpose\nWriting multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs.\nMethods\nThe authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. MEDLINE was used as a search database.\nResults\nOverall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify.\nConclusions\nLLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations.","PeriodicalId":501387,"journal":{"name":"medRxiv - Medical Education","volume":"108 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Medical Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.01.06.24300920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose Writing multiple choice questions (MCQs) for the purpose of medical exams is challenging. It requires extensive medical knowledge, time and effort from medical educators. This systematic review focuses on the application of large language models (LLMs) in generating medical MCQs. Methods The authors searched for studies published up to November 2023. Search terms focused on LLMs generated MCQs for medical examinations. MEDLINE was used as a search database. Results Overall, eight studies published between April 2023 and October 2023 were included. Six studies used Chat-GPT 3.5, while two employed GPT 4. Five studies showed that LLMs can produce competent questions valid for medical exams. Three studies used LLMs to write medical questions but did not evaluate the validity of the questions. One study conducted a comparative analysis of different models. One other study compared LLM-generated questions with those written by humans. All studies presented faulty questions that were deemed inappropriate for medical exams. Some questions required additional modifications in order to qualify. Conclusions LLMs can be used to write MCQs for medical examinations. However, their limitations cannot be ignored. Further study in this field is essential and more conclusive evidence is needed. Until then, LLMs may serve as a supplementary tool for writing medical examinations.

查看原文本刊更多论文

用于生成医学检查结果的大型语言模型：系统综述

目的为医学考试编写选择题（MCQs）是一项具有挑战性的工作。它需要医学教育工作者具备丰富的医学知识、花费大量的时间和精力。本系统综述重点关注大语言模型（LLM）在生成医学 MCQ 中的应用。搜索关键词主要集中在LLMs生成的医学考试MCQ。结果共纳入了 2023 年 4 月至 2023 年 10 月间发表的 8 项研究。六项研究使用了 Chat-GPT 3.5，两项使用了 GPT 4。五项研究表明，法学硕士可以编写出符合医学考试要求的试题。三项研究使用 LLM 编写医学试题，但未对试题的有效性进行评估。一项研究对不同模式进行了比较分析。另一项研究将 LLM 生成的试题与人类编写的试题进行了比较。所有研究都提出了被认为不适合医学考试的错误问题。结论LLM 可用于编写医学考试的 MCQ。然而，其局限性不容忽视。在这一领域开展进一步的研究至关重要，而且需要更多确凿的证据。在此之前，LLM 可以作为编写医学考试题库的辅助工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv - Medical Education

自引率

0.00%

发文量