{"title":"不同人工智能聊天机器人生成医学影像选择题的适用性、难度和判别指标比较","authors":"B.N. Karahan , E. Emekli","doi":"10.1016/j.radi.2025.103087","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Creating high-quality multiple-choice questions (MCQs) is vital in health education, particularly in fields like medical imaging. AI-based chatbots have emerged as a tool to automate this process. This study evaluates the applicability, difficulty, and discrimination indices of MCQs generated by various AI chatbots for medical imaging education.</div></div><div><h3>Methods</h3><div>80 MCQs were generated by seven AI-based chatbots (Claude 3, Claude 3.5, ChatGPT-3.5, ChatGPT-4.0, Copilot, Gemini, Turin Q, and Writesonic) using lecture materials. These questions were evaluated for relevance, accuracy, and originality by radiology faculty, and then administered to 56 students and 12 research assistants. The questions were analyzed using Miller's Pyramid to assess cognitive levels, with difficulty and discrimination indices calculated.</div></div><div><h3>Discussion</h3><div>AI-based chatbots generated MCQs suitable for medical imaging education, with 72.5 % of the questions deemed appropriate. Most questions assessed recall (79.31 %), suggesting that AI models excel at generating basic knowledge questions but struggle with higher cognitive skills. Differences in question quality were noted between chatbots, with Claude 3 being the most reliable. The difficulty index averaged 0.62, indicating a moderate level of difficulty, but some models produced easier questions.</div></div><div><h3>Conclusion</h3><div>AI chatbots show promise for automating MCQ creation in health education, though most questions focus on recall. For AI to fully support health education, further development is needed to improve question quality, especially in higher cognitive domains.</div></div><div><h3>Implication for practice</h3><div>AI-based chatbots can support educators in generating MCQs, especially for assessing basic knowledge in medical imaging. While useful for saving time, expert review remains essential to ensure question quality and to address higher-level cognitive skills. Integrating AI tools into assessment workflows may enhance efficiency, provided there is appropriate oversight.</div></div>","PeriodicalId":47416,"journal":{"name":"Radiography","volume":"31 5","pages":"Article 103087"},"PeriodicalIF":2.5000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of applicability, difficulty, and discrimination indices of multiple-choice questions on medical imaging generated by different AI-based chatbots\",\"authors\":\"B.N. Karahan , E. Emekli\",\"doi\":\"10.1016/j.radi.2025.103087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>Creating high-quality multiple-choice questions (MCQs) is vital in health education, particularly in fields like medical imaging. AI-based chatbots have emerged as a tool to automate this process. This study evaluates the applicability, difficulty, and discrimination indices of MCQs generated by various AI chatbots for medical imaging education.</div></div><div><h3>Methods</h3><div>80 MCQs were generated by seven AI-based chatbots (Claude 3, Claude 3.5, ChatGPT-3.5, ChatGPT-4.0, Copilot, Gemini, Turin Q, and Writesonic) using lecture materials. These questions were evaluated for relevance, accuracy, and originality by radiology faculty, and then administered to 56 students and 12 research assistants. The questions were analyzed using Miller's Pyramid to assess cognitive levels, with difficulty and discrimination indices calculated.</div></div><div><h3>Discussion</h3><div>AI-based chatbots generated MCQs suitable for medical imaging education, with 72.5 % of the questions deemed appropriate. Most questions assessed recall (79.31 %), suggesting that AI models excel at generating basic knowledge questions but struggle with higher cognitive skills. Differences in question quality were noted between chatbots, with Claude 3 being the most reliable. The difficulty index averaged 0.62, indicating a moderate level of difficulty, but some models produced easier questions.</div></div><div><h3>Conclusion</h3><div>AI chatbots show promise for automating MCQ creation in health education, though most questions focus on recall. For AI to fully support health education, further development is needed to improve question quality, especially in higher cognitive domains.</div></div><div><h3>Implication for practice</h3><div>AI-based chatbots can support educators in generating MCQs, especially for assessing basic knowledge in medical imaging. While useful for saving time, expert review remains essential to ensure question quality and to address higher-level cognitive skills. Integrating AI tools into assessment workflows may enhance efficiency, provided there is appropriate oversight.</div></div>\",\"PeriodicalId\":47416,\"journal\":{\"name\":\"Radiography\",\"volume\":\"31 5\",\"pages\":\"Article 103087\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radiography\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1078817425002317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiography","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1078817425002317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Comparison of applicability, difficulty, and discrimination indices of multiple-choice questions on medical imaging generated by different AI-based chatbots
Introduction
Creating high-quality multiple-choice questions (MCQs) is vital in health education, particularly in fields like medical imaging. AI-based chatbots have emerged as a tool to automate this process. This study evaluates the applicability, difficulty, and discrimination indices of MCQs generated by various AI chatbots for medical imaging education.
Methods
80 MCQs were generated by seven AI-based chatbots (Claude 3, Claude 3.5, ChatGPT-3.5, ChatGPT-4.0, Copilot, Gemini, Turin Q, and Writesonic) using lecture materials. These questions were evaluated for relevance, accuracy, and originality by radiology faculty, and then administered to 56 students and 12 research assistants. The questions were analyzed using Miller's Pyramid to assess cognitive levels, with difficulty and discrimination indices calculated.
Discussion
AI-based chatbots generated MCQs suitable for medical imaging education, with 72.5 % of the questions deemed appropriate. Most questions assessed recall (79.31 %), suggesting that AI models excel at generating basic knowledge questions but struggle with higher cognitive skills. Differences in question quality were noted between chatbots, with Claude 3 being the most reliable. The difficulty index averaged 0.62, indicating a moderate level of difficulty, but some models produced easier questions.
Conclusion
AI chatbots show promise for automating MCQ creation in health education, though most questions focus on recall. For AI to fully support health education, further development is needed to improve question quality, especially in higher cognitive domains.
Implication for practice
AI-based chatbots can support educators in generating MCQs, especially for assessing basic knowledge in medical imaging. While useful for saving time, expert review remains essential to ensure question quality and to address higher-level cognitive skills. Integrating AI tools into assessment workflows may enhance efficiency, provided there is appropriate oversight.
RadiographyRADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
4.70
自引率
34.60%
发文量
169
审稿时长
63 days
期刊介绍:
Radiography is an International, English language, peer-reviewed journal of diagnostic imaging and radiation therapy. Radiography is the official professional journal of the College of Radiographers and is published quarterly. Radiography aims to publish the highest quality material, both clinical and scientific, on all aspects of diagnostic imaging and radiation therapy and oncology.