Applying generalized theory to optimize the quality of high-stakes objective structured clinical examinations for undergraduate medical students: experience from the French medical school.

IF 2.7 2区医学 Q1 EDUCATION & EDUCATIONAL RESEARCH

BMC Medical Education Pub Date : 2025-05-02 DOI:10.1186/s12909-025-07255-y

Eva Feigerlova

{"title":"Applying generalized theory to optimize the quality of high-stakes objective structured clinical examinations for undergraduate medical students: experience from the French medical school.","authors":"Eva Feigerlova","doi":"10.1186/s12909-025-07255-y","DOIUrl":null,"url":null,"abstract":"Background: The national OSCE examination has recently been adopted in France as a prerequisite for medical students to enter accredited graduate education programs. However, the reliability and generalizability of OSCE scores are not well explored taking into account the national examination blueprint.Method: To obtain complementary information for monitoring and improving the quality of the OSCE we performed a pilot study applying generalizability (G-)theory on a sample of 6th-year undergraduate medical students (n = 73) who were assessed by 24 examiner pairs at three stations. Based on the national blueprint, three different scoring subunits (a dichotomous task-specific checklist evaluating clinical skills and behaviorally anchored scales evaluating generic skills and a global performance scale) were used to evaluate students and combined into a station score. A variance component analysis was performed using mixed modelling to identify the impact of different facets (station, student and student x station interactions) on the scoring subunits. The generalizability and dependability statistics were calculated.Results: There was no significant difference between mean scores attributable to different examiner pairs across the data. The examiner variance component was greater for the clinical skills score (14.4%) than for the generic skills (5.6%) and global performance scores (5.1%). The station variance component was largest for the clinical skills score, accounting for 22.9% of the total score variance, compared to 3% for the generic skills and 13.9% for global performance scores. The variance component related to student represented 12% of the total variance for clinicals skills, 17.4% for generic skills and 14.3% for global performance ratings. The combined generalizability coefficients across all the data were 0.59 for the clinical skills score, 0.93 for the generic skills score and 0.75 for global performance.Conclusions: The combined estimates of relative reliability across all data are greater for generic skills scores and global performance ratings than for clinical skills scores. This is likely explained by the fact that content-specific tasks evaluated using checklists produce greater variability in scores than scales evaluating broader competencies. This work can be valuable to other teaching institutions, as monitoring the sources of errors is a principal quality control strategy to ensure valid interpretations of the students' scores.","PeriodicalId":51234,"journal":{"name":"BMC Medical Education","volume":"25 1","pages":"643"},"PeriodicalIF":2.7000,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12046744/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Education","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12909-025-07255-y","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The national OSCE examination has recently been adopted in France as a prerequisite for medical students to enter accredited graduate education programs. However, the reliability and generalizability of OSCE scores are not well explored taking into account the national examination blueprint.

Method: To obtain complementary information for monitoring and improving the quality of the OSCE we performed a pilot study applying generalizability (G-)theory on a sample of 6th-year undergraduate medical students (n = 73) who were assessed by 24 examiner pairs at three stations. Based on the national blueprint, three different scoring subunits (a dichotomous task-specific checklist evaluating clinical skills and behaviorally anchored scales evaluating generic skills and a global performance scale) were used to evaluate students and combined into a station score. A variance component analysis was performed using mixed modelling to identify the impact of different facets (station, student and student x station interactions) on the scoring subunits. The generalizability and dependability statistics were calculated.

Results: There was no significant difference between mean scores attributable to different examiner pairs across the data. The examiner variance component was greater for the clinical skills score (14.4%) than for the generic skills (5.6%) and global performance scores (5.1%). The station variance component was largest for the clinical skills score, accounting for 22.9% of the total score variance, compared to 3% for the generic skills and 13.9% for global performance scores. The variance component related to student represented 12% of the total variance for clinicals skills, 17.4% for generic skills and 14.3% for global performance ratings. The combined generalizability coefficients across all the data were 0.59 for the clinical skills score, 0.93 for the generic skills score and 0.75 for global performance.

Conclusions: The combined estimates of relative reliability across all data are greater for generic skills scores and global performance ratings than for clinical skills scores. This is likely explained by the fact that content-specific tasks evaluated using checklists produce greater variability in scores than scales evaluating broader competencies. This work can be valuable to other teaching institutions, as monitoring the sources of errors is a principal quality control strategy to ensure valid interpretations of the students' scores.

查看原文本刊更多论文

应用广义理论优化医科本科生高风险客观结构化临床考试质量：来自法国医学院的经验。

背景：欧安组织国家考试最近在法国被采纳为医科学生进入认可的研究生教育课程的先决条件。然而，考虑到国家考试蓝图，欧安组织分数的可靠性和普遍性并没有得到很好的探讨。方法：为了获得监测和提高欧安组织质量所需的补充信息，我们应用概率性（G-）理论对3个站点的24对考官进行了一项6年级本科医学生样本（n = 73）的初步研究。基于国家蓝图，使用三种不同的评分亚单位（评估临床技能的二元任务特定检查表、评估通用技能的行为锚定量表和全球绩效量表）对学生进行评估，并将其合并成一个站点得分。使用混合建模进行方差成分分析，以确定不同方面（车站，学生和学生与车站的相互作用）对得分子单元的影响。计算了算法的概括性和可靠性统计量。结果：数据中不同审查员对的平均得分无显著差异。临床技能分数的考官方差成分（14.4%）大于通用技能分数（5.6%）和整体表现分数（5.1%）。临床技能得分的站方差成分最大，占总得分方差的22.9%，而一般技能和整体表现得分分别为3%和13.9%。与学生相关的方差占临床技能总方差的12%，通用技能总方差的17.4%，全球绩效评级总方差的14.3%。所有数据的综合通用性系数为临床技能得分为0.59，一般技能得分为0.93，整体表现为0.75。结论：综合估计所有数据的相对可靠性，通用技能评分和整体表现评分比临床技能评分更高。这可能是因为使用检查表评估特定内容的任务比评估更广泛能力的量表在得分上产生更大的变化。这项工作对其他教学机构很有价值，因为监测错误来源是确保对学生分数有效解释的主要质量控制策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Education EDUCATION, SCIENTIFIC DISCIPLINES-

CiteScore

4.90

自引率

11.10%

发文量

795

审稿时长

6 months

期刊介绍： BMC Medical Education is an open access journal publishing original peer-reviewed research articles in relation to the training of healthcare professionals, including undergraduate, postgraduate, and continuing education. The journal has a special focus on curriculum development, evaluations of performance, assessment of training needs and evidence-based medicine.