{"title":"Can Large Language Models Replicate Systematic Review Outcome Classifications in Medical Education? A Pilot Study Using Kirkpatrick Levels.","authors":"Giuliano Romano, Emilio Romano, Michelle Rau","doi":"10.1007/s40670-026-02639-1","DOIUrl":null,"url":null,"abstract":"<p><p>Systematic reviews in medical education often classify outcomes using the Kirkpatrick framework, but manual coding is time-consuming and subjective. We conducted a proof-of-concept study testing ChatGPT (GPT-5, August 2025 release) on 32 full-text articles from a published systematic review of sepsis education. Agreement with human-coded outcomes was modest: 50% percent agreement, unweighted κ = 0.170 (95% CI 0.000-0.458), weighted κ = 0.351 (95% CI 0.074-0.629). Most disagreements were between adjacent levels.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s40670-026-02639-1.</p>","PeriodicalId":37113,"journal":{"name":"Medical Science Educator","volume":"36 1","pages":"11-15"},"PeriodicalIF":1.8000,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13043860/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Science Educator","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s40670-026-02639-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Systematic reviews in medical education often classify outcomes using the Kirkpatrick framework, but manual coding is time-consuming and subjective. We conducted a proof-of-concept study testing ChatGPT (GPT-5, August 2025 release) on 32 full-text articles from a published systematic review of sepsis education. Agreement with human-coded outcomes was modest: 50% percent agreement, unweighted κ = 0.170 (95% CI 0.000-0.458), weighted κ = 0.351 (95% CI 0.074-0.629). Most disagreements were between adjacent levels.
Supplementary information: The online version contains supplementary material available at 10.1007/s40670-026-02639-1.
期刊介绍:
Medical Science Educator is the successor of the journal JIAMSE. It is the peer-reviewed publication of the International Association of Medical Science Educators (IAMSE). The Journal offers all who teach in healthcare the most current information to succeed in their task by publishing scholarly activities, opinions, and resources in medical science education. Published articles focus on teaching the sciences fundamental to modern medicine and health, and include basic science education, clinical teaching, and the use of modern education technologies. The Journal provides the readership a better understanding of teaching and learning techniques in order to advance medical science education.