Can ChatGPT Generate Acceptable Case-Based Multiple-Choice Questions for Medical School Anatomy Exams? A Pilot Study on Item Difficulty and Discrimination.
Yavuz Selim Kıyak, Ayşe Soylu, Özlem Coşkun, Işıl İrem Budakoğlu, Tuncay Veysel Peker
{"title":"Can ChatGPT Generate Acceptable Case-Based Multiple-Choice Questions for Medical School Anatomy Exams? A Pilot Study on Item Difficulty and Discrimination.","authors":"Yavuz Selim Kıyak, Ayşe Soylu, Özlem Coşkun, Işıl İrem Budakoğlu, Tuncay Veysel Peker","doi":"10.1002/ca.24271","DOIUrl":null,"url":null,"abstract":"<p><p>Developing high-quality multiple-choice questions (MCQs) for medical school exams is effortful and time-consuming. In this study, we investigated the ability of ChatGPT to generate case-based anatomy MCQs with acceptable levels of item difficulty and discrimination for medical school exams. We used ChatGPT to generate case-based anatomy MCQs for an endocrine and urogenital system exam based on a framework for artificial intelligence (AI)-assisted item generation. The questions were evaluated by experts, approved by the department, and administered to 502 second-year medical students (372 Turkish-language, 130 English-language). The items were analyzed to determine the discrimination and difficulty indices. The item discrimination indices ranged from 0.29 to 0.54, indicating acceptable differentiation between high- and low-performing students. All items in Turkish (six out of six) and five out of six in English met the higher discrimination threshold (≥ 0.30) required for large-scale standardized tests. The item difficulty indices ranged from 0.41 to 0.89, most items falling within the moderate difficulty range (0.20-0.80). Therefore, it was concluded that ChatGPT can generate case-based anatomy MCQs with acceptable psychometric properties, offering a promising tool for medical educators. However, human expertise remains crucial for reviewing and refining AI-generated assessment items. Future research should explore AI-generated MCQs across various anatomy topics and investigate different AI models for question generation.</p>","PeriodicalId":50687,"journal":{"name":"Clinical Anatomy","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Anatomy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/ca.24271","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANATOMY & MORPHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Developing high-quality multiple-choice questions (MCQs) for medical school exams is effortful and time-consuming. In this study, we investigated the ability of ChatGPT to generate case-based anatomy MCQs with acceptable levels of item difficulty and discrimination for medical school exams. We used ChatGPT to generate case-based anatomy MCQs for an endocrine and urogenital system exam based on a framework for artificial intelligence (AI)-assisted item generation. The questions were evaluated by experts, approved by the department, and administered to 502 second-year medical students (372 Turkish-language, 130 English-language). The items were analyzed to determine the discrimination and difficulty indices. The item discrimination indices ranged from 0.29 to 0.54, indicating acceptable differentiation between high- and low-performing students. All items in Turkish (six out of six) and five out of six in English met the higher discrimination threshold (≥ 0.30) required for large-scale standardized tests. The item difficulty indices ranged from 0.41 to 0.89, most items falling within the moderate difficulty range (0.20-0.80). Therefore, it was concluded that ChatGPT can generate case-based anatomy MCQs with acceptable psychometric properties, offering a promising tool for medical educators. However, human expertise remains crucial for reviewing and refining AI-generated assessment items. Future research should explore AI-generated MCQs across various anatomy topics and investigate different AI models for question generation.
期刊介绍:
Clinical Anatomy is the Official Journal of the American Association of Clinical Anatomists and the British Association of Clinical Anatomists. The goal of Clinical Anatomy is to provide a medium for the exchange of current information between anatomists and clinicians. This journal embraces anatomy in all its aspects as applied to medical practice. Furthermore, the journal assists physicians and other health care providers in keeping abreast of new methodologies for patient management and informs educators of new developments in clinical anatomy and teaching techniques. Clinical Anatomy publishes original and review articles of scientific, clinical, and educational interest. Papers covering the application of anatomic principles to the solution of clinical problems and/or the application of clinical observations to expand anatomic knowledge are welcomed.