GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.

IF 9.3 Q1 EDUCATION, SCIENTIFIC DISCIPLINES

Journal of Educational Evaluation for Health Professions Pub Date : 2024-01-01 Epub Date: 2024-08-20 DOI:10.3352/jeehp.2024.21.21

Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow

{"title":"GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.","authors":"Sebastian Ebel, Constantin Ehrengut, Timm Denecke, Holger Gößmann, Anne Bettina Beeskow","doi":"10.3352/jeehp.2024.21.21","DOIUrl":null,"url":null,"abstract":"Purpose: This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.Methods: GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.Results: GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.Conclusion: GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.","PeriodicalId":46098,"journal":{"name":"Journal of Educational Evaluation for Health Professions","volume":"21 ","pages":"21"},"PeriodicalIF":9.3000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11894030/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Evaluation for Health Professions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3352/jeehp.2024.21.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.

Methods: GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.

Results: GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.

Conclusion: GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.

查看原文本刊更多论文

与德国医科学生和专家相比，GPT-4o 在模拟欧洲介入放射学委员会笔试中的答题能力及其生成介入放射学考试项目的能力：一项描述性研究。

目的：本研究旨在确定生成式人工智能（AI）平台 ChatGPT-4o 能否通过欧洲介入放射学委员会（EBIR）的模拟笔试，以及 GPT-4o 能否通过生成介入放射学的考试项目用于培训医学生和不同专业水平的介入放射医师：方法：要求 GPT-4o 回答欧洲心血管和介入放射学会（CIRSE）为 EBIR 准备的 370 个模拟考试项目（CIRSE 准备）。随后，GPT-4o 被要求生成适合医学生和 EBIR 考试难度的介入放射学题目。这些生成的题目由 4 名参与者回答，其中包括一名医学生、一名住院医师、一名顾问和一名 EBIR 持有者。对正确回答的题目进行统计。一名研究人员检查了 GPT-4o 生成的答案和项目的正确性和相关性。这项工作于 2024 年 4 月至 7 月完成：GPT-4o 正确回答了 370 个 CIRSE 预备项目中的 248 个（67.0%）。在 50 个 CIRSE 预备项目中，医学生回答正确率为 46.0%，住院医师为 42.0%，顾问为 50.0%，EBIR 持有者为 74.0%。在学生水平的 50 个 GPT-4o 生成项目中，所有参与者的正确率为 82.0% 至 92.0%。在 EBIR 级别的 50 个 GPT-4o 项目中，医学生的正确率为 32.0%，住院医师为 44.0%，顾问为 48.0%，EBIR 持有者为 66.0%。所有参与者都能通过学生水平的 GPT-4o 生成项目；而 EBIR 持有者能通过 EBIR 水平的 GPT-4o 生成项目。在 GPT-4o 生成的 150 个项目中，有 2 个项目（0.3%）被评为不可信：结论：GPT-4o 可以通过模拟 EBIR 笔试，并生成不同难度的考试项目，以培训医学生和介入放射医师。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Educational Evaluation for Health Professions EDUCATION, SCIENTIFIC DISCIPLINES-

CiteScore

9.60

自引率

9.10%

发文量

审稿时长

5 weeks

期刊介绍： Journal of Educational Evaluation for Health Professions aims to provide readers the state-of-the art practical information on the educational evaluation for health professions so that to increase the quality of undergraduate, graduate, and continuing education. It is specialized in educational evaluation including adoption of measurement theory to medical health education, promotion of high stakes examination such as national licensing examinations, improvement of nationwide or international programs of education, computer-based testing, computerized adaptive testing, and medical health regulatory bodies. Its field comprises a variety of professions that address public medical health as following but not limited to: Care workers Dental hygienists Dental technicians Dentists Dietitians Emergency medical technicians Health educators Medical record technicians Medical technologists Midwives Nurses Nursing aides Occupational therapists Opticians Oriental medical doctors Oriental medicine dispensers Oriental pharmacists Pharmacists Physical therapists Physicians Prosthetists and Orthotists Radiological technologists Rehabilitation counselor Sanitary technicians Speech-language therapists.