Comparison of applicability, difficulty, and discrimination indices of multiple-choice questions on medical imaging generated by different AI-based chatbots

IF 2.5 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
B.N. Karahan , E. Emekli
{"title":"Comparison of applicability, difficulty, and discrimination indices of multiple-choice questions on medical imaging generated by different AI-based chatbots","authors":"B.N. Karahan ,&nbsp;E. Emekli","doi":"10.1016/j.radi.2025.103087","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Creating high-quality multiple-choice questions (MCQs) is vital in health education, particularly in fields like medical imaging. AI-based chatbots have emerged as a tool to automate this process. This study evaluates the applicability, difficulty, and discrimination indices of MCQs generated by various AI chatbots for medical imaging education.</div></div><div><h3>Methods</h3><div>80 MCQs were generated by seven AI-based chatbots (Claude 3, Claude 3.5, ChatGPT-3.5, ChatGPT-4.0, Copilot, Gemini, Turin Q, and Writesonic) using lecture materials. These questions were evaluated for relevance, accuracy, and originality by radiology faculty, and then administered to 56 students and 12 research assistants. The questions were analyzed using Miller's Pyramid to assess cognitive levels, with difficulty and discrimination indices calculated.</div></div><div><h3>Discussion</h3><div>AI-based chatbots generated MCQs suitable for medical imaging education, with 72.5 % of the questions deemed appropriate. Most questions assessed recall (79.31 %), suggesting that AI models excel at generating basic knowledge questions but struggle with higher cognitive skills. Differences in question quality were noted between chatbots, with Claude 3 being the most reliable. The difficulty index averaged 0.62, indicating a moderate level of difficulty, but some models produced easier questions.</div></div><div><h3>Conclusion</h3><div>AI chatbots show promise for automating MCQ creation in health education, though most questions focus on recall. For AI to fully support health education, further development is needed to improve question quality, especially in higher cognitive domains.</div></div><div><h3>Implication for practice</h3><div>AI-based chatbots can support educators in generating MCQs, especially for assessing basic knowledge in medical imaging. While useful for saving time, expert review remains essential to ensure question quality and to address higher-level cognitive skills. Integrating AI tools into assessment workflows may enhance efficiency, provided there is appropriate oversight.</div></div>","PeriodicalId":47416,"journal":{"name":"Radiography","volume":"31 5","pages":"Article 103087"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiography","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1078817425002317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction

Creating high-quality multiple-choice questions (MCQs) is vital in health education, particularly in fields like medical imaging. AI-based chatbots have emerged as a tool to automate this process. This study evaluates the applicability, difficulty, and discrimination indices of MCQs generated by various AI chatbots for medical imaging education.

Methods

80 MCQs were generated by seven AI-based chatbots (Claude 3, Claude 3.5, ChatGPT-3.5, ChatGPT-4.0, Copilot, Gemini, Turin Q, and Writesonic) using lecture materials. These questions were evaluated for relevance, accuracy, and originality by radiology faculty, and then administered to 56 students and 12 research assistants. The questions were analyzed using Miller's Pyramid to assess cognitive levels, with difficulty and discrimination indices calculated.

Discussion

AI-based chatbots generated MCQs suitable for medical imaging education, with 72.5 % of the questions deemed appropriate. Most questions assessed recall (79.31 %), suggesting that AI models excel at generating basic knowledge questions but struggle with higher cognitive skills. Differences in question quality were noted between chatbots, with Claude 3 being the most reliable. The difficulty index averaged 0.62, indicating a moderate level of difficulty, but some models produced easier questions.

Conclusion

AI chatbots show promise for automating MCQ creation in health education, though most questions focus on recall. For AI to fully support health education, further development is needed to improve question quality, especially in higher cognitive domains.

Implication for practice

AI-based chatbots can support educators in generating MCQs, especially for assessing basic knowledge in medical imaging. While useful for saving time, expert review remains essential to ensure question quality and to address higher-level cognitive skills. Integrating AI tools into assessment workflows may enhance efficiency, provided there is appropriate oversight.
不同人工智能聊天机器人生成医学影像选择题的适用性、难度和判别指标比较
创建高质量的选择题(mcq)在健康教育中是至关重要的,特别是在医学成像等领域。基于人工智能的聊天机器人已经成为自动化这一过程的工具。本研究评估了各种AI聊天机器人生成的mcq在医学影像教育中的适用性、难度和判别指标。方法使用讲座材料,由7个基于人工智能的聊天机器人(Claude 3、Claude 3.5、ChatGPT-3.5、ChatGPT-4.0、Copilot、Gemini、Turin Q和Writesonic)生成80个mcq。这些问题由放射科教师评估其相关性、准确性和独创性,然后交给56名学生和12名研究助理。用米勒金字塔法对问题进行分析,评估认知水平,并计算出难度指数和辨别指数。基于人工智能的聊天机器人生成了适合医学影像教育的mcq,其中72.5%的问题被认为是合适的。大多数问题评估了召回率(79.31%),这表明人工智能模型在生成基础知识问题方面表现出色,但在更高的认知技能方面表现不佳。不同的聊天机器人在问题质量上存在差异,克劳德3是最可靠的。难度指数平均为0.62,表明难度适中,但一些模型产生的问题更容易。结论人工智能聊天机器人有望在健康教育中自动化MCQ创建,尽管大多数问题集中在回忆上。为了使人工智能充分支持健康教育,需要进一步发展以提高问题质量,特别是在更高的认知领域。对实践的启示基于人工智能的聊天机器人可以支持教育工作者生成mcq,特别是用于评估医学成像的基础知识。专家审查虽然有助于节省时间,但对于确保问题质量和解决更高水平的认知技能仍然至关重要。如果有适当的监督,将人工智能工具集成到评估工作流程中可以提高效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Radiography
Radiography RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
4.70
自引率
34.60%
发文量
169
审稿时长
63 days
期刊介绍: Radiography is an International, English language, peer-reviewed journal of diagnostic imaging and radiation therapy. Radiography is the official professional journal of the College of Radiographers and is published quarterly. Radiography aims to publish the highest quality material, both clinical and scientific, on all aspects of diagnostic imaging and radiation therapy and oncology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信