{"title":"AI in radiography education: Evaluating multiple-choice questions difficulty and discrimination","authors":"Emre Emekli, Betül Nalan Karahan","doi":"10.1016/j.jmir.2025.101896","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>High-quality multiple-choice questions (MCQs) are essential for effective student assessment in health education. However, the manual creation of MCQs is labour-intensive, requiring significant time and expertise. With the increasing demand for large and continuously updated question banks, artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT, has emerged as a potential tool for automating question generation. While AI-assisted question generation has shown promise, its ability to match human-authored MCQs in terms of difficulty and discrimination indices remains unclear. This study aims to compare the effectiveness of AI-generated and faculty-authored MCQs in radiography education, addressing a critical gap in evaluating AI's role in assessment processes. The findings will be beneficial for educators and curriculum designers exploring AI integration into health education.</div></div><div><h3>Methods</h3><div>This study was conducted in Turkey during the 2024–2025 academic year. Participants included 56 students enrolled in the first year of the Medical Imaging Programme. Two separate 30-question MCQ exams were developed—one generated by ChatGPT-4o and the other by a faculty member. The questions were derived from radiographic anatomy and positioning content, covering topics such as cranial, vertebral, pelvic, and lower extremity radiographs. Each exam contained six questions per topic, categorised into easy, medium, and difficult levels. A quantitative research design was employed. Students took both exams on separate days, without knowing the source of the questions. Difficulty and discrimination indices were calculated for each question, and student feedback was collected using a 5-point Likert scale to evaluate their perceptions of the exams.</div></div><div><h3>Results</h3><div>A total of 56 out of 80 eligible students participated, yielding a response rate of 70 %. The mean number of correct answers are similar for ChatGPT (14.91 ± 4.25) and human expert exams (15.82 ± 4.73; p = 0.089). Exam scores showed moderate positive correlation (r = 0.628, p < 0.001). ChatGPT achieved an average difficulty index of 0.50 versus 0.53 for human experts. Discrimination indices were acceptable for 73.33 % of ChatGPT questions and 86.67 % of human expert questions.</div></div><div><h3>Conclusion</h3><div>LLMs like ChatGPT can generate MCQs of comparable quality to human expert questions, though slight limitations in discrimination and difficulty alignment remain. These models hold promise for supplementing assessment processes in health education.</div></div>","PeriodicalId":46420,"journal":{"name":"Journal of Medical Imaging and Radiation Sciences","volume":"56 4","pages":"Article 101896"},"PeriodicalIF":1.3000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging and Radiation Sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1939865425000463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background
High-quality multiple-choice questions (MCQs) are essential for effective student assessment in health education. However, the manual creation of MCQs is labour-intensive, requiring significant time and expertise. With the increasing demand for large and continuously updated question banks, artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT, has emerged as a potential tool for automating question generation. While AI-assisted question generation has shown promise, its ability to match human-authored MCQs in terms of difficulty and discrimination indices remains unclear. This study aims to compare the effectiveness of AI-generated and faculty-authored MCQs in radiography education, addressing a critical gap in evaluating AI's role in assessment processes. The findings will be beneficial for educators and curriculum designers exploring AI integration into health education.
Methods
This study was conducted in Turkey during the 2024–2025 academic year. Participants included 56 students enrolled in the first year of the Medical Imaging Programme. Two separate 30-question MCQ exams were developed—one generated by ChatGPT-4o and the other by a faculty member. The questions were derived from radiographic anatomy and positioning content, covering topics such as cranial, vertebral, pelvic, and lower extremity radiographs. Each exam contained six questions per topic, categorised into easy, medium, and difficult levels. A quantitative research design was employed. Students took both exams on separate days, without knowing the source of the questions. Difficulty and discrimination indices were calculated for each question, and student feedback was collected using a 5-point Likert scale to evaluate their perceptions of the exams.
Results
A total of 56 out of 80 eligible students participated, yielding a response rate of 70 %. The mean number of correct answers are similar for ChatGPT (14.91 ± 4.25) and human expert exams (15.82 ± 4.73; p = 0.089). Exam scores showed moderate positive correlation (r = 0.628, p < 0.001). ChatGPT achieved an average difficulty index of 0.50 versus 0.53 for human experts. Discrimination indices were acceptable for 73.33 % of ChatGPT questions and 86.67 % of human expert questions.
Conclusion
LLMs like ChatGPT can generate MCQs of comparable quality to human expert questions, though slight limitations in discrimination and difficulty alignment remain. These models hold promise for supplementing assessment processes in health education.
期刊介绍:
Journal of Medical Imaging and Radiation Sciences is the official peer-reviewed journal of the Canadian Association of Medical Radiation Technologists. This journal is published four times a year and is circulated to approximately 11,000 medical radiation technologists, libraries and radiology departments throughout Canada, the United States and overseas. The Journal publishes articles on recent research, new technology and techniques, professional practices, technologists viewpoints as well as relevant book reviews.