AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination

IF 1.3 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Emre Emekli, Betül Nalan Karahan
{"title":"AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination","authors":"Emre Emekli,&nbsp;Betül Nalan Karahan","doi":"10.1016/j.jmir.2025.101896","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>High-quality multiple-choice questions (MCQs) are essential for effective student assessment in health education. However, the manual creation of MCQs is labour-intensive, requiring significant time and expertise. With the increasing demand for large and continuously updated question banks, artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT, has emerged as a potential tool for automating question generation. While AI-assisted question generation has shown promise, its ability to match human-authored MCQs in terms of difficulty and discrimination indices remains unclear. This study aims to compare the effectiveness of AI-generated and faculty-authored MCQs in radiography education, addressing a critical gap in evaluating AI's role in assessment processes. The findings will be beneficial for educators and curriculum designers exploring AI integration into health education.</div></div><div><h3>Methods</h3><div>This study was conducted in Turkey during the 2024–2025 academic year. Participants included 56 students enrolled in the first year of the Medical Imaging Programme. Two separate 30-question MCQ exams were developed—one generated by ChatGPT-4o and the other by a faculty member. The questions were derived from radiographic anatomy and positioning content, covering topics such as cranial, vertebral, pelvic, and lower extremity radiographs. Each exam contained six questions per topic, categorised into easy, medium, and difficult levels. A quantitative research design was employed. Students took both exams on separate days, without knowing the source of the questions. Difficulty and discrimination indices were calculated for each question, and student feedback was collected using a 5-point Likert scale to evaluate their perceptions of the exams.</div></div><div><h3>Results</h3><div>A total of 56 out of 80 eligible students participated, yielding a response rate of 70 %. The mean number of correct answers are similar for ChatGPT (14.91 ± 4.25) and human expert exams (15.82 ± 4.73; p = 0.089). Exam scores showed moderate positive correlation (r = 0.628, p &lt; 0.001). ChatGPT achieved an average difficulty index of 0.50 versus 0.53 for human experts. Discrimination indices were acceptable for 73.33 % of ChatGPT questions and 86.67 % of human expert questions.</div></div><div><h3>Conclusion</h3><div>LLMs like ChatGPT can generate MCQs of comparable quality to human expert questions, though slight limitations in discrimination and difficulty alignment remain. These models hold promise for supplementing assessment processes in health education.</div></div>","PeriodicalId":46420,"journal":{"name":"Journal of Medical Imaging and Radiation Sciences","volume":"56 4","pages":"Article 101896"},"PeriodicalIF":1.3000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging and Radiation Sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1939865425000463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background

High-quality multiple-choice questions (MCQs) are essential for effective student assessment in health education. However, the manual creation of MCQs is labour-intensive, requiring significant time and expertise. With the increasing demand for large and continuously updated question banks, artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT, has emerged as a potential tool for automating question generation. While AI-assisted question generation has shown promise, its ability to match human-authored MCQs in terms of difficulty and discrimination indices remains unclear. This study aims to compare the effectiveness of AI-generated and faculty-authored MCQs in radiography education, addressing a critical gap in evaluating AI's role in assessment processes. The findings will be beneficial for educators and curriculum designers exploring AI integration into health education.

Methods

This study was conducted in Turkey during the 2024–2025 academic year. Participants included 56 students enrolled in the first year of the Medical Imaging Programme. Two separate 30-question MCQ exams were developed—one generated by ChatGPT-4o and the other by a faculty member. The questions were derived from radiographic anatomy and positioning content, covering topics such as cranial, vertebral, pelvic, and lower extremity radiographs. Each exam contained six questions per topic, categorised into easy, medium, and difficult levels. A quantitative research design was employed. Students took both exams on separate days, without knowing the source of the questions. Difficulty and discrimination indices were calculated for each question, and student feedback was collected using a 5-point Likert scale to evaluate their perceptions of the exams.

Results

A total of 56 out of 80 eligible students participated, yielding a response rate of 70 %. The mean number of correct answers are similar for ChatGPT (14.91 ± 4.25) and human expert exams (15.82 ± 4.73; p = 0.089). Exam scores showed moderate positive correlation (r = 0.628, p < 0.001). ChatGPT achieved an average difficulty index of 0.50 versus 0.53 for human experts. Discrimination indices were acceptable for 73.33 % of ChatGPT questions and 86.67 % of human expert questions.

Conclusion

LLMs like ChatGPT can generate MCQs of comparable quality to human expert questions, though slight limitations in discrimination and difficulty alignment remain. These models hold promise for supplementing assessment processes in health education.
人工智能在放射学教育中的应用:评估选择题的难度和歧视
背景在健康教育中,高质量的选择题是有效评估学生的必要条件。然而,手工创建mcq是劳动密集型的,需要大量的时间和专业知识。随着对大型和不断更新的题库的需求不断增加,人工智能(AI),特别是像ChatGPT这样的大型语言模型(llm),已经成为自动化问题生成的潜在工具。虽然人工智能辅助问题生成已经显示出前景,但它在难度和歧视指数方面与人类编写的mcq相匹配的能力仍不清楚。本研究旨在比较人工智能生成的mcq和教师编写的mcq在放射学教育中的有效性,解决评估人工智能在评估过程中作用的关键差距。这些发现将有助于教育工作者和课程设计师探索将人工智能融入健康教育。方法本研究于2024-2025学年在土耳其进行。参加者包括56名第一年就读医学影像课程的学生。两个独立的30题MCQ测试被开发出来——一个由chatgpt - 40生成,另一个由一名教员生成。这些问题来源于放射学解剖和定位内容,涵盖了诸如颅脑、椎体、骨盆和下肢x线片等主题。每次考试每个主题包含六个问题,分为简单、中等和困难三个级别。采用定量研究设计。学生们在不知道题目来源的情况下,在不同的日子参加了两场考试。计算每个问题的难度和区别指数,并使用5分李克特量表收集学生反馈,以评估他们对考试的看法。结果在80名符合条件的学生中,共有56人参与,回复率为70%。ChatGPT的平均正确答案数(14.91±4.25)与人类专家考试的平均正确答案数(15.82±4.73)相似;P = 0.089)。考试成绩呈中度正相关(r = 0.628, p <;0.001)。ChatGPT的平均难度指数为0.50,而人类专家的平均难度指数为0.53。73.33%的ChatGPT问题和86.67%的人类专家问题的判别指标是可接受的。结论像ChatGPT这样的llm可以生成与人类专家问题相当质量的mcq,尽管在区分和难度对齐方面仍然存在轻微限制。这些模式有望补充健康教育的评估过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Medical Imaging and Radiation Sciences
Journal of Medical Imaging and Radiation Sciences RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
2.30
自引率
11.10%
发文量
231
审稿时长
53 days
期刊介绍: Journal of Medical Imaging and Radiation Sciences is the official peer-reviewed journal of the Canadian Association of Medical Radiation Technologists. This journal is published four times a year and is circulated to approximately 11,000 medical radiation technologists, libraries and radiology departments throughout Canada, the United States and overseas. The Journal publishes articles on recent research, new technology and techniques, professional practices, technologists viewpoints as well as relevant book reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信