Anatomy exam model for the circulatory and respiratory systems using GPT-4: a medical school study.

IF 1.4 4区 医学 Q2 Medicine
Ayla Tekin, Nizameddin Fatih Karamus, Tuncay Çolak
{"title":"Anatomy exam model for the circulatory and respiratory systems using GPT-4: a medical school study.","authors":"Ayla Tekin, Nizameddin Fatih Karamus, Tuncay Çolak","doi":"10.1007/s00276-025-03667-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The study aimed to evaluate the effectiveness of anatomy multiple-choice questions (MCQs) generated by GPT-4, focused on their methodological appropriateness and alignment with the cognitive levels defined by Bloom's revised taxonomy to enhance assessment.</p><p><strong>Methods: </strong>The assessment questions developed for medical students were created utilizing GPT-4, comprising 240 MCQs organized into subcategories consistent with Bloom's revised taxonomy. When designing prompts to create MCQs, details about the lesson's purpose, learning objectives, and students' prior experiences were included to ensure the questions were contextually appropriate. A set of 30 MCQs was randomly selected from the generated questions for testing. A total of 280 students participated in the examination, which assessed the difficulty index of the MCQs, the item discrimination index, and the overall test difficulty level. Expert anatomists examined the taxonomy accuracy of GPT-4's questions.</p><p><strong>Results: </strong>Students achieved a median score of 50 (range, 36.67-60) points on the test. The test's internal consistency, assessed by KR-20, was 0.737. The average difficulty of the test was 0.5012. Results show difficulty and discrimination indices for each AI-generated question. Expert anatomists' taxonomy-based classifications matched GPT-4's 26.6%. Meanwhile, 80.9% of students found the questions were clear, and 85.8% showed interest in retaking the assessment exam.</p><p><strong>Conclusion: </strong>This study demonstrates GPT-4's significant potential for generating medical education exam questions. While it effectively assesses basic knowledge recall, it fails to sufficiently evaluate higher-order cognitive processes outlined in Bloom's revised taxonomy. Future research should consider alternative methods that combine AI with expert evaluation and specialized multimodal models.</p>","PeriodicalId":49461,"journal":{"name":"Surgical and Radiologic Anatomy","volume":"47 1","pages":"158"},"PeriodicalIF":1.4000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical and Radiologic Anatomy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00276-025-03667-z","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The study aimed to evaluate the effectiveness of anatomy multiple-choice questions (MCQs) generated by GPT-4, focused on their methodological appropriateness and alignment with the cognitive levels defined by Bloom's revised taxonomy to enhance assessment.

Methods: The assessment questions developed for medical students were created utilizing GPT-4, comprising 240 MCQs organized into subcategories consistent with Bloom's revised taxonomy. When designing prompts to create MCQs, details about the lesson's purpose, learning objectives, and students' prior experiences were included to ensure the questions were contextually appropriate. A set of 30 MCQs was randomly selected from the generated questions for testing. A total of 280 students participated in the examination, which assessed the difficulty index of the MCQs, the item discrimination index, and the overall test difficulty level. Expert anatomists examined the taxonomy accuracy of GPT-4's questions.

Results: Students achieved a median score of 50 (range, 36.67-60) points on the test. The test's internal consistency, assessed by KR-20, was 0.737. The average difficulty of the test was 0.5012. Results show difficulty and discrimination indices for each AI-generated question. Expert anatomists' taxonomy-based classifications matched GPT-4's 26.6%. Meanwhile, 80.9% of students found the questions were clear, and 85.8% showed interest in retaking the assessment exam.

Conclusion: This study demonstrates GPT-4's significant potential for generating medical education exam questions. While it effectively assesses basic knowledge recall, it fails to sufficiently evaluate higher-order cognitive processes outlined in Bloom's revised taxonomy. Future research should consider alternative methods that combine AI with expert evaluation and specialized multimodal models.

使用GPT-4的循环和呼吸系统解剖检查模型:一项医学院研究。
目的:本研究旨在评估GPT-4生成的解剖学多项选择题(mcq)的有效性,重点关注其方法的适当性以及与Bloom修订分类法定义的认知水平的一致性,以加强评估。方法:采用GPT-4编制医学生评估题,包括240个mcq,按照Bloom修订后的分类法进行分类。当设计提示来创建mcq时,包括课程目的、学习目标和学生先前经验的细节,以确保问题符合上下文。从生成的问题中随机抽取30个mcq进行测试。共有280名学生参加了考试,评估了mcq的难度指数、项目辨别指数和整体考试难度水平。解剖学专家检查了GPT-4问题的分类准确性。结果:学生在测试中获得了50分(范围36.67-60)。通过KR-20评估,该测试的内部一致性为0.737。测试的平均难度为0.5012。结果显示了每个人工智能生成的问题的难度和区分指数。解剖学专家基于分类学的分类与GPT-4的26.6%相匹配。与此同时,80.9%的学生认为问题清晰,85.8%的学生表示有兴趣重新参加评估考试。结论:本研究证明GPT-4在生成医学教育考题方面具有显著的潜力。虽然它有效地评估了基本知识的回忆,但它未能充分评估布鲁姆修订分类法中概述的高阶认知过程。未来的研究应该考虑将人工智能与专家评估和专门的多模态模型相结合的替代方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Surgical and Radiologic Anatomy
Surgical and Radiologic Anatomy Medicine-Pathology and Forensic Medicine
CiteScore
2.40
自引率
14.30%
发文量
0
期刊介绍: Anatomy is a morphological science which cannot fail to interest the clinician. The practical application of anatomical research to clinical problems necessitates special adaptation and selectivity in choosing from numerous international works. Although there is a tendency to believe that meaningful advances in anatomy are unlikely, constant revision is necessary. Surgical and Radiologic Anatomy, the first international journal of Clinical anatomy has been created in this spirit. Its goal is to serve clinicians, regardless of speciality-physicians, surgeons, radiologists or other specialists-as an indispensable aid with which they can improve their knowledge of anatomy. Each issue includes: Original papers, review articles, articles on the anatomical bases of medical, surgical and radiological techniques, articles of normal radiologic anatomy, brief reviews of anatomical publications of clinical interest. Particular attention is given to high quality illustrations, which are indispensable for a better understanding of anatomical problems. Surgical and Radiologic Anatomy is a journal written by anatomists for clinicians with a special interest in anatomy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信