采用新测试设计的 COMLEX-USA Level 3 有效性研究。

IF 1.1 Q2 MEDICINE, GENERAL & INTERNAL

Journal of Osteopathic Medicine Pub Date : 2024-03-19 eCollection Date: 2024-06-01 DOI:10.1515/jom-2023-0011

Xia Mao, John R Boulet, Jeanne M Sandella, Michael F Oliverio, Larissa Smith

{"title":"采用新测试设计的 COMLEX-USA Level 3 有效性研究。","authors":"Xia Mao, John R Boulet, Jeanne M Sandella, Michael F Oliverio, Larissa Smith","doi":"10.1515/jom-2023-0011","DOIUrl":null,"url":null,"abstract":"Context: The National Board of Osteopathic Medical Examiners (NBOME) administers the Comprehensive Osteopathic Medical Licensing Examination of the United States (COMLEX-USA), a three-level examination designed for licensure for the practice of osteopathic medicine. The examination design for COMLEX-USA Level 3 (L3) was changed in September 2018 to a two-day computer-based examination with two components: a multiple-choice question (MCQ) component with single best answer and a clinical decision-making (CDM) case component with extended multiple-choice (EMC) and short answer (SA) questions. Continued validation of the L3 examination, especially with the new design, is essential for the appropriate interpretation and use of the test scores.Objectives: The purpose of this study is to gather evidence to support the validity of the L3 examination scores under the new design utilizing sources of evidence based on Kane's validity framework.Methods: Kane's validity framework contains four components of evidence to support the validity argument: Scoring, Generalization, Extrapolation, and Implication/Decision. In this study, we gathered data from various sources and conducted analyses to provide evidence that the L3 examination is validly measuring what it is supposed to measure. These include reviewing content coverage of the L3 examination, documenting scoring and reporting processes, estimating the reliability and decision accuracy/consistency of the scores, quantifying associations between the scores from the MCQ and CDM components and between scores from different competency domains of the L3 examination, exploring the relationships between L3 scores and scores from a performance-based assessment that measures related constructs, performing subgroup comparisons, and describing and justifying the criterion-referenced standard setting process. The analysis data contains first-attempt test scores for 8,366 candidates who took the L3 examination between September 2018 and December 2019. The performance-based assessment utilized as a criterion measure in this study is COMLEX-USA Level 2 Performance Evaluation (L2-PE).Results: All assessment forms were built through the automated test assembly (ATA) procedure to maximize parallelism in terms of content coverage and statistical properties across the forms. Scoring and reporting follows industry-standard quality-control procedures. The inter-rater reliability of SA rating, decision accuracy, and decision consistency for pass/fail classifications are all very high. There is a statistically significant positive association between the MCQ and the CDM components of the L3 examination. The patterns of associations, both within the L3 subscores and with L2-PE domain scores, fit with what is being measured. The subgroup comparisons by gender, race, and first language showed expected small differences in mean scores between the subgroups within each category and yielded findings that are consistent with those described in the literature. The L3 pass/fail standard was established through implementation of a defensible criterion-referenced procedure.Conclusions: This study provides some additional validity evidence for the L3 examination based on Kane's validity framework. The validity of any measurement must be established through ongoing evaluation of the related evidence. The NBOME will continue to collect evidence to support validity arguments for the COMLEX-USA examination series.","PeriodicalId":36050,"journal":{"name":"Journal of Osteopathic Medicine","volume":" ","pages":"257-265"},"PeriodicalIF":1.1000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A validity study of COMLEX-USA Level 3 with the new test design.\",\"authors\":\"Xia Mao, John R Boulet, Jeanne M Sandella, Michael F Oliverio, Larissa Smith\",\"doi\":\"10.1515/jom-2023-0011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Context: The National Board of Osteopathic Medical Examiners (NBOME) administers the Comprehensive Osteopathic Medical Licensing Examination of the United States (COMLEX-USA), a three-level examination designed for licensure for the practice of osteopathic medicine. The examination design for COMLEX-USA Level 3 (L3) was changed in September 2018 to a two-day computer-based examination with two components: a multiple-choice question (MCQ) component with single best answer and a clinical decision-making (CDM) case component with extended multiple-choice (EMC) and short answer (SA) questions. Continued validation of the L3 examination, especially with the new design, is essential for the appropriate interpretation and use of the test scores.Objectives: The purpose of this study is to gather evidence to support the validity of the L3 examination scores under the new design utilizing sources of evidence based on Kane's validity framework.Methods: Kane's validity framework contains four components of evidence to support the validity argument: Scoring, Generalization, Extrapolation, and Implication/Decision. In this study, we gathered data from various sources and conducted analyses to provide evidence that the L3 examination is validly measuring what it is supposed to measure. These include reviewing content coverage of the L3 examination, documenting scoring and reporting processes, estimating the reliability and decision accuracy/consistency of the scores, quantifying associations between the scores from the MCQ and CDM components and between scores from different competency domains of the L3 examination, exploring the relationships between L3 scores and scores from a performance-based assessment that measures related constructs, performing subgroup comparisons, and describing and justifying the criterion-referenced standard setting process. The analysis data contains first-attempt test scores for 8,366 candidates who took the L3 examination between September 2018 and December 2019. The performance-based assessment utilized as a criterion measure in this study is COMLEX-USA Level 2 Performance Evaluation (L2-PE).Results: All assessment forms were built through the automated test assembly (ATA) procedure to maximize parallelism in terms of content coverage and statistical properties across the forms. Scoring and reporting follows industry-standard quality-control procedures. The inter-rater reliability of SA rating, decision accuracy, and decision consistency for pass/fail classifications are all very high. There is a statistically significant positive association between the MCQ and the CDM components of the L3 examination. The patterns of associations, both within the L3 subscores and with L2-PE domain scores, fit with what is being measured. The subgroup comparisons by gender, race, and first language showed expected small differences in mean scores between the subgroups within each category and yielded findings that are consistent with those described in the literature. The L3 pass/fail standard was established through implementation of a defensible criterion-referenced procedure.Conclusions: This study provides some additional validity evidence for the L3 examination based on Kane's validity framework. The validity of any measurement must be established through ongoing evaluation of the related evidence. The NBOME will continue to collect evidence to support validity arguments for the COMLEX-USA examination series.\",\"PeriodicalId\":36050,\"journal\":{\"name\":\"Journal of Osteopathic Medicine\",\"volume\":\" \",\"pages\":\"257-265\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2024-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Osteopathic Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/jom-2023-0011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Osteopathic Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jom-2023-0011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}

引用次数: 0

摘要

背景：美国国家骨科医学考试委员会（NBOME）负责管理美国骨科医学执业资格综合考试（COMLEX-USA），该考试分为三个级别，旨在获得骨科医学执业资格。COMLEX-USA 3 级（L3）的考试设计于 2018 年 9 月变更为为期两天的计算机辅助考试，包括两个部分：带有单一最佳答案的多项选择题（MCQ）部分和带有扩展多项选择题（EMC）和简答题（SA）的临床决策（CDM）案例部分。继续验证 L3 考试，尤其是采用新设计的 L3 考试，对于适当解释和使用考试分数至关重要：本研究的目的是根据 Kane 的有效性框架，利用证据来源收集证据，以支持新设计下 L3 考试成绩的有效性：Kane 的有效性框架包含支持有效性论证的四个证据组成部分：方法：凯恩的效度框架包含支持效度论证的四个证据部分：评分、归纳、外推和暗示/决定。在本研究中，我们从各种来源收集数据并进行分析，以提供证据证明 L3 考试有效地测量了它应该测量的内容。这些分析包括审查 L3 考试的内容覆盖面、记录评分和报告过程、估计分数的可靠性和判定准确性/一致性、量化 MCQ 和 CDM 组成部分的分数之间以及 L3 考试不同能力领域的分数之间的关联、探索 L3 分数与测量相关建构的绩效评估分数之间的关系、进行分组比较以及描述和说明标准参照式标准制定过程。分析数据包含 2018 年 9 月至 2019 年 12 月期间参加 L3 考试的 8366 名考生的首次考试成绩。本研究中作为标准衡量标准的绩效评估是 COMLEX-USA 2 级绩效评估（L2-PE）：所有评估表格都是通过自动测试组装（ATA）程序构建的，以最大限度地提高各表格在内容覆盖和统计属性方面的并行性。评分和报告遵循行业标准质量控制程序。SA评分的评分者间可靠性、判定准确性以及通过/未通过分类的判定一致性都非常高。据统计，L3 考试中的 MCQ 和 CDM 部分之间存在明显的正相关关系。无论是 L3 子分数，还是 L2-PE 领域分数，其关联模式都与所测量的内容相符。按性别、种族和第一语言进行的亚组比较显示，每个类别中的亚组之间的平均分差异很小，与文献中描述的结果一致。L3 及格/不及格标准是通过实施一个可辩护的标准参照程序而确定的：本研究以凯恩的有效性框架为基础，为 L3 考试提供了一些额外的有效性证据。任何测量的有效性都必须通过对相关证据的持续评估来确定。NBOME 将继续收集证据，以支持 COMLEX-USA 考试系列的有效性论证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A validity study of COMLEX-USA Level 3 with the new test design.

Context: The National Board of Osteopathic Medical Examiners (NBOME) administers the Comprehensive Osteopathic Medical Licensing Examination of the United States (COMLEX-USA), a three-level examination designed for licensure for the practice of osteopathic medicine. The examination design for COMLEX-USA Level 3 (L3) was changed in September 2018 to a two-day computer-based examination with two components: a multiple-choice question (MCQ) component with single best answer and a clinical decision-making (CDM) case component with extended multiple-choice (EMC) and short answer (SA) questions. Continued validation of the L3 examination, especially with the new design, is essential for the appropriate interpretation and use of the test scores.

Objectives: The purpose of this study is to gather evidence to support the validity of the L3 examination scores under the new design utilizing sources of evidence based on Kane's validity framework.

Methods: Kane's validity framework contains four components of evidence to support the validity argument: Scoring, Generalization, Extrapolation, and Implication/Decision. In this study, we gathered data from various sources and conducted analyses to provide evidence that the L3 examination is validly measuring what it is supposed to measure. These include reviewing content coverage of the L3 examination, documenting scoring and reporting processes, estimating the reliability and decision accuracy/consistency of the scores, quantifying associations between the scores from the MCQ and CDM components and between scores from different competency domains of the L3 examination, exploring the relationships between L3 scores and scores from a performance-based assessment that measures related constructs, performing subgroup comparisons, and describing and justifying the criterion-referenced standard setting process. The analysis data contains first-attempt test scores for 8,366 candidates who took the L3 examination between September 2018 and December 2019. The performance-based assessment utilized as a criterion measure in this study is COMLEX-USA Level 2 Performance Evaluation (L2-PE).

Results: All assessment forms were built through the automated test assembly (ATA) procedure to maximize parallelism in terms of content coverage and statistical properties across the forms. Scoring and reporting follows industry-standard quality-control procedures. The inter-rater reliability of SA rating, decision accuracy, and decision consistency for pass/fail classifications are all very high. There is a statistically significant positive association between the MCQ and the CDM components of the L3 examination. The patterns of associations, both within the L3 subscores and with L2-PE domain scores, fit with what is being measured. The subgroup comparisons by gender, race, and first language showed expected small differences in mean scores between the subgroups within each category and yielded findings that are consistent with those described in the literature. The L3 pass/fail standard was established through implementation of a defensible criterion-referenced procedure.

Conclusions: This study provides some additional validity evidence for the L3 examination based on Kane's validity framework. The validity of any measurement must be established through ongoing evaluation of the related evidence. The NBOME will continue to collect evidence to support validity arguments for the COMLEX-USA examination series.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Osteopathic Medicine Health Professions-Complementary and Manual Therapy

CiteScore

2.20

自引率

13.30%

发文量

118