Peter Yeates, Adriano Maluf, Gareth McCray, Ruth Kinston, Natalie Cope, Kathy Cullen, Vikki O'Neill, Aidan Cole, Ching-Wa Chung, Rhian Goodfellow, Rebecca Vallender, Sue Ensaff, Rikki Goddard-Fuller, Robert McKinley
{"title":"Inter-school variations in the standard of examiners' graduation-level OSCE judgements.","authors":"Peter Yeates, Adriano Maluf, Gareth McCray, Ruth Kinston, Natalie Cope, Kathy Cullen, Vikki O'Neill, Aidan Cole, Ching-Wa Chung, Rhian Goodfellow, Rebecca Vallender, Sue Ensaff, Rikki Goddard-Fuller, Robert McKinley","doi":"10.1080/0142159X.2024.2372087","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation.</p><p><strong>Methods: </strong>The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school.</p><p><strong>Results: </strong>On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4).</p><p><strong>Discussion: </strong>Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.</p>","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"735-743"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2024.2372087","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation.
Methods: The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school.
Results: On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4).
Discussion: Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.
期刊介绍:
Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.