A Comparison of Psychometric Properties of the American Board of Anesthesiology's In-Person and Virtual Standardized Oral Examinations.

IF 5.3 2区教育学 Q1 EDUCATION, SCIENTIFIC DISCIPLINES

Academic Medicine Pub Date : 2025-01-01 Epub Date: 2024-06-07 DOI:10.1097/ACM.0000000000005782

Mark T Keegan, Ann E Harman, Stacie G Deiner, Huaping Sun

{"title":"A Comparison of Psychometric Properties of the American Board of Anesthesiology's In-Person and Virtual Standardized Oral Examinations.","authors":"Mark T Keegan, Ann E Harman, Stacie G Deiner, Huaping Sun","doi":"10.1097/ACM.0000000000005782","DOIUrl":null,"url":null,"abstract":"Purpose: The COVID-19 pandemic prompted training institutions and national credentialing organizations to administer examinations virtually. This study compared task difficulty, examiner grading, candidate performance, and other psychometric properties between in-person and virtual standardized oral examinations (SOEs) administered by the American Board of Anesthesiology.Method: This retrospective study included SOEs administered in person from March 2018 to March 2020 and virtually from December 2020 to November 2021. The in-person and virtual SOEs share the same structure, including 4 tasks of preoperative evaluation, intraoperative management, postoperative care, and additional topics. The Many-Facet Rasch Model was used to estimate candidate performance, examiner grading severity, and task difficulty for the in-person and virtual SOEs separately; the virtual SOE was equated to the in-person SOE by common examiners and all tasks. The independent-samples and partially overlapping-samples t tests were used to compare candidate performance and examiner grading severity between these 2 formats, respectively.Results: In-person (n = 3,462) and virtual (n = 2,959) first-time candidates were comparable in age, sex, race and ethnicity, and whether they were U.S. medical school graduates. The mean (standard deviation [SD]) candidate performance was 2.96 (1.76) logits for the virtual SOE, which was statistically significantly better than that for the in-person SOE (mean [SD], 2.86 [1.75]; Welch independent-samples t test, P = .02); however, the effect size was negligible (Cohen d = 0.06). The difference in the grading severity of examiners who rated the in-person (n = 398; mean [SD], 0.00 [0.73]) versus virtual (n = 341; mean [SD], 0.07 [0.77]) SOE was not statistically significant (Welch partially overlapping-samples t test, P = .07).Conclusions: Candidate performance and examiner grading severity were comparable between the in-person and virtual SOEs, supporting the reliability and validity of the virtual oral exam in this large-volume, high-stakes setting.","PeriodicalId":50929,"journal":{"name":"Academic Medicine","volume":" ","pages":"86-93"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Medicine","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1097/ACM.0000000000005782","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"EDUCATION, SCIENTIFIC DISCIPLINES","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: The COVID-19 pandemic prompted training institutions and national credentialing organizations to administer examinations virtually. This study compared task difficulty, examiner grading, candidate performance, and other psychometric properties between in-person and virtual standardized oral examinations (SOEs) administered by the American Board of Anesthesiology.

Method: This retrospective study included SOEs administered in person from March 2018 to March 2020 and virtually from December 2020 to November 2021. The in-person and virtual SOEs share the same structure, including 4 tasks of preoperative evaluation, intraoperative management, postoperative care, and additional topics. The Many-Facet Rasch Model was used to estimate candidate performance, examiner grading severity, and task difficulty for the in-person and virtual SOEs separately; the virtual SOE was equated to the in-person SOE by common examiners and all tasks. The independent-samples and partially overlapping-samples t tests were used to compare candidate performance and examiner grading severity between these 2 formats, respectively.

Results: In-person (n = 3,462) and virtual (n = 2,959) first-time candidates were comparable in age, sex, race and ethnicity, and whether they were U.S. medical school graduates. The mean (standard deviation [SD]) candidate performance was 2.96 (1.76) logits for the virtual SOE, which was statistically significantly better than that for the in-person SOE (mean [SD], 2.86 [1.75]; Welch independent-samples t test, P = .02); however, the effect size was negligible (Cohen d = 0.06). The difference in the grading severity of examiners who rated the in-person (n = 398; mean [SD], 0.00 [0.73]) versus virtual (n = 341; mean [SD], 0.07 [0.77]) SOE was not statistically significant (Welch partially overlapping-samples t test, P = .07).

Conclusions: Candidate performance and examiner grading severity were comparable between the in-person and virtual SOEs, supporting the reliability and validity of the virtual oral exam in this large-volume, high-stakes setting.

查看原文本刊更多论文

美国麻醉学委员会亲自口试和虚拟标准化口试的心理测量特性比较。

目的：COVID-19 大流行促使培训机构和国家资格认证组织采用虚拟方式进行考试。本研究比较了美国麻醉学委员会举办的现场和虚拟标准化口试（SOE）的任务难度、考官评分、考生表现和其他心理测量特性：这项回顾性研究包括 2018 年 3 月至 2020 年 3 月期间进行的面对面 SOE 和 2020 年 12 月至 2021 年 11 月期间进行的虚拟 SOE。面授和虚拟 SOE 的结构相同，包括术前评估、术中管理、术后护理和附加主题 4 项任务。多方面拉施模型（Many-Facet Rasch Model）被用来估算考生的成绩、考官评分的严重程度以及面授和虚拟 SOE 的任务难度；虚拟 SOE 通过共同的考官和所有任务等同于面授 SOE。采用独立样本和部分重叠样本t检验分别比较两种形式的考生成绩和考官评分严重程度：首次参加考试的考生在年龄、性别、种族和民族以及是否为美国医学院毕业生等方面均具有可比性。虚拟 SOE 的考生成绩平均值（标准差 [SD]）为 2.96 (1.76) logits，在统计学上明显优于现场 SOE（平均值 [SD]，2.86 [1.75]；韦尔奇独立样本 t 检验，P = .02）；但是，两者的效应大小可以忽略不计（Cohen d = 0.06）。考官对现场（n = 398；平均值[SD]，0.00 [0.73]）与虚拟（n = 341；平均值[SD]，0.07 [0.77]）SOE 的评分严重程度差异无统计学意义（韦尔奇部分重叠样本 t 检验，P = .07）：考生成绩和考官评分严重程度在现场和虚拟 SOE 中不相上下，证明了虚拟口试在这种大容量、高风险环境中的可靠性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Academic Medicine 医学-卫生保健

CiteScore

7.80

自引率

9.50%

发文量

982

审稿时长

3-6 weeks

期刊介绍： Academic Medicine, the official peer-reviewed journal of the Association of American Medical Colleges, acts as an international forum for exchanging ideas, information, and strategies to address the significant challenges in academic medicine. The journal covers areas such as research, education, clinical care, community collaboration, and leadership, with a commitment to serving the public interest.