Inter-school variations in the standard of examiners' graduation-level OSCE judgements.

IF 3.3 2区教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES

Medical Teacher Pub Date : 2025-04-01 Epub Date: 2024-07-08 DOI:10.1080/0142159X.2024.2372087

Peter Yeates, Adriano Maluf, Gareth McCray, Ruth Kinston, Natalie Cope, Kathy Cullen, Vikki O'Neill, Aidan Cole, Ching-Wa Chung, Rhian Goodfellow, Rebecca Vallender, Sue Ensaff, Rikki Goddard-Fuller, Robert McKinley

{"title":"Inter-school variations in the standard of examiners' graduation-level OSCE judgements.","authors":"Peter Yeates, Adriano Maluf, Gareth McCray, Ruth Kinston, Natalie Cope, Kathy Cullen, Vikki O'Neill, Aidan Cole, Ching-Wa Chung, Rhian Goodfellow, Rebecca Vallender, Sue Ensaff, Rikki Goddard-Fuller, Robert McKinley","doi":"10.1080/0142159X.2024.2372087","DOIUrl":null,"url":null,"abstract":"Introduction: Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation.Methods: The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school.Results: On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4).Discussion: Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.","PeriodicalId":18643,"journal":{"name":"Medical Teacher","volume":" ","pages":"735-743"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical Teacher","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0142159X.2024.2372087","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation.

Methods: The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school.

Results: On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4).

Discussion: Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.

查看原文本刊更多论文

考官对毕业水平 OSCE 评判标准的校际差异。

导言：确保高风险成绩考试的等效性对患者安全和考生公平性非常重要。我们比较了在共享 OSCE 中考官之间的差异及其对学生及格/不及格分类的影响：方法：同样的 6 站形成性 OSCE 在 4 所医学院中异步进行，每所医学院有 2 个平行回路。我们使用基于视频的考官评分比较和调整（VESCA）对考官的判断进行了比较：考官除了对 "现场 "学生的表现进行评分外，还对特定考站的比较视频进行评分，从而实现了1/考官队列和2/学校的可控评分比较，以及2/数据链接以调整考官队列的影响。我们按学校计算了分数影响和及格/不及格分类的变化：在基于视频的对照比较中，考官评分的校际差异（16.3%）几乎是校内差异（8.8%）的两倍。学生分数调整的中位数为 5.26%（IQR 为 2.87-7.17%）。调整考官差异对学生及格/不及格分类的影响因学校而异，调整后不及格率从 39.13% 降至 8.70%（学校 2），而不及格率从 0.00% 升至 21.74%（学校 4）：讨论：虽然形成性背景可能是造成差异的部分原因，但这些研究结果对医学院校之间考官的判断是否存在差异提出了质疑。这可能得益于系统评估，以确保等效性。VESCA 提供了一种可行的比较方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical Teacher 医学-卫生保健

CiteScore

7.80

自引率

8.50%

发文量

396

审稿时长

3-6 weeks

期刊介绍： Medical Teacher provides accounts of new teaching methods, guidance on structuring courses and assessing achievement, and serves as a forum for communication between medical teachers and those involved in general education. In particular, the journal recognizes the problems teachers have in keeping up-to-date with the developments in educational methods that lead to more effective teaching and learning at a time when the content of the curriculum—from medical procedures to policy changes in health care provision—is also changing. The journal features reports of innovation and research in medical education, case studies, survey articles, practical guidelines, reviews of current literature and book reviews. All articles are peer reviewed.