{"title":"Using large language models (LLMs) to apply analytic rubrics to score post-encounter notes.","authors":"Christopher Runyon","doi":"10.1080/0142159X.2025.2504106","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language models (LLMs) show promise in medical education. This study examines LLMs' ability to score post-encounter notes (PNs) from Objective Structured Clinical Examinations (OSCEs) using an analytic rubric. The goal was to evaluate and refine methods for accurate, consistent scoring.</p><p><strong>Methods: </strong>Seven LLMs scored five PNs representing varying levels of performance, including an intentionally incorrect PN. An iterative experimental design tested different prompting strategies and temperature settings, a parameter controlling LLM response creativity. Scores were compared to expected rubric-based results.</p><p><strong>Results: </strong>Consistently accurate scoring required multiple rounds of prompt refinement. Simple prompting led to high variability, which improved with structured approaches and low-temperature settings. LLMs occasionally made errors calculating total scores, necessitating external calculation. The final approach yielded consistently accurate scores across all models.</p><p><strong>Conclusions: </strong>LLMs can reliably apply analytic rubrics to PNs with careful prompt engineering and process refinement. This study illustrates their potential as scalable, automated scoring tools in medical education, though further research is needed to explore their use with holistic rubrics. These findings demonstrate the utility of LLMs in assessment practices.</p>","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"1-9"},"PeriodicalIF":3.3000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2025.2504106","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Large language models (LLMs) show promise in medical education. This study examines LLMs' ability to score post-encounter notes (PNs) from Objective Structured Clinical Examinations (OSCEs) using an analytic rubric. The goal was to evaluate and refine methods for accurate, consistent scoring.
Methods: Seven LLMs scored five PNs representing varying levels of performance, including an intentionally incorrect PN. An iterative experimental design tested different prompting strategies and temperature settings, a parameter controlling LLM response creativity. Scores were compared to expected rubric-based results.
Results: Consistently accurate scoring required multiple rounds of prompt refinement. Simple prompting led to high variability, which improved with structured approaches and low-temperature settings. LLMs occasionally made errors calculating total scores, necessitating external calculation. The final approach yielded consistently accurate scores across all models.
Conclusions: LLMs can reliably apply analytic rubrics to PNs with careful prompt engineering and process refinement. This study illustrates their potential as scalable, automated scoring tools in medical education, though further research is needed to explore their use with holistic rubrics. These findings demonstrate the utility of LLMs in assessment practices.
期刊介绍:
Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.