考察在CELBAN口语中的表现:一个多方面的Rasch测量分析

IF 0.5 0 LANGUAGE & LINGUISTICS

Canadian Journal of Applied Linguistics Pub Date : 2020-10-16 DOI:10.37213/cjal.2020.30436

Peiyu Wang, Karen L. Coetzee, Andy Strachan, S. Monteiro, Liying Cheng

{"title":"考察在CELBAN口语中的表现:一个多方面的Rasch测量分析","authors":"Peiyu Wang, Karen L. Coetzee, Andy Strachan, S. Monteiro, Liying Cheng","doi":"10.37213/cjal.2020.30436","DOIUrl":null,"url":null,"abstract":"Internationally educated nurses’ (IENs) English language proficiency is critical to professional licensure as communication is a key competency for safe practice. The Canadian English Language Benchmark Assessment for Nurses (CELBAN) is Canada’s only Canadian Language Benchmarks (CLB) referenced examination used in the context of healthcare regulation. This high-stakes assessment claims proof of proficiency for IENs seeking licensure in Canada and a measure of public safety for nursing regulators. Understanding the quality of rater performance when examination results are used for high-stakes decisions is crucial to maintaining speaking test quality as it involves judgement, and thus requires strong reliability evidence (Koizumi et al., 2017). This study examined rater performance on the CELBAN Speaking component using a Many-Facets Rasch Measurement (MFRM). Specifically, this study identified CELBAN rater reliability in terms of consistency and severity, rating bias, and use of rating scale. The study was based on a sample of 115 raters across eight test sites in Canada and results on 2698 examinations across four parallel versions. Findings demonstrated relatively high inter-rater reliability and intra-rater reliability, and that CLB-based speaking descriptors (CLB 6-9) provided sufficient information for raters to discriminate examinees’ oral proficiency. There was no influence of test site or test version, offering validity evidence to support test use for high-stakes purposes. Grammar, among the eight speaking criteria, was identified as the most difficult criterion on the scale, and the one demonstrating most rater bias. This study highlights the value of MFRM analysis in rater performance research with implications for rater training. This study is one of the first research studies using MFRM with a CLB-referenced high-stakes assessment within the Canadian context.","PeriodicalId":43961,"journal":{"name":"Canadian Journal of Applied Linguistics","volume":"76 1","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Examining Rater Performance on the CELBAN Speaking: A Many-Facets Rasch Measurement Analysis\",\"authors\":\"Peiyu Wang, Karen L. Coetzee, Andy Strachan, S. Monteiro, Liying Cheng\",\"doi\":\"10.37213/cjal.2020.30436\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Internationally educated nurses’ (IENs) English language proficiency is critical to professional licensure as communication is a key competency for safe practice. The Canadian English Language Benchmark Assessment for Nurses (CELBAN) is Canada’s only Canadian Language Benchmarks (CLB) referenced examination used in the context of healthcare regulation. This high-stakes assessment claims proof of proficiency for IENs seeking licensure in Canada and a measure of public safety for nursing regulators. Understanding the quality of rater performance when examination results are used for high-stakes decisions is crucial to maintaining speaking test quality as it involves judgement, and thus requires strong reliability evidence (Koizumi et al., 2017). This study examined rater performance on the CELBAN Speaking component using a Many-Facets Rasch Measurement (MFRM). Specifically, this study identified CELBAN rater reliability in terms of consistency and severity, rating bias, and use of rating scale. The study was based on a sample of 115 raters across eight test sites in Canada and results on 2698 examinations across four parallel versions. Findings demonstrated relatively high inter-rater reliability and intra-rater reliability, and that CLB-based speaking descriptors (CLB 6-9) provided sufficient information for raters to discriminate examinees’ oral proficiency. There was no influence of test site or test version, offering validity evidence to support test use for high-stakes purposes. Grammar, among the eight speaking criteria, was identified as the most difficult criterion on the scale, and the one demonstrating most rater bias. This study highlights the value of MFRM analysis in rater performance research with implications for rater training. This study is one of the first research studies using MFRM with a CLB-referenced high-stakes assessment within the Canadian context.\",\"PeriodicalId\":43961,\"journal\":{\"name\":\"Canadian Journal of Applied Linguistics\",\"volume\":\"76 1\",\"pages\":\"\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2020-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian Journal of Applied Linguistics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37213/cjal.2020.30436\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Applied Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37213/cjal.2020.30436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 2

摘要

受过国际教育的护士(IENs)的英语能力对专业执照至关重要，因为沟通是安全实践的关键能力。加拿大护士英语基准评估(CELBAN)是加拿大唯一的加拿大语言基准(CLB)参考考试，用于医疗保健法规的背景下。这一高风险的评估为在加拿大寻求执照的IENs提供了熟练程度的证明，并为护理监管机构提供了公共安全措施。当考试结果用于高风险决策时，理解评分者的表现质量对于维持口语考试质量至关重要，因为它涉及判断，因此需要强有力的可靠性证据(Koizumi等人，2017)。本研究使用多面拉赫测量(MFRM)检测了CELBAN说话组件的性能。具体而言，本研究确定了CELBAN评分在一致性和严重性、评分偏差和评分量表使用方面的可靠性。这项研究基于加拿大8个考点的115名评分者的样本，以及四个平行版本的2698次考试的结果。结果显示，评分者之间和内部的信度较高，基于CLB的口语描述符(CLB 6-9)为评分者区分考生的口语水平提供了足够的信息。没有测试地点或测试版本的影响，为支持高风险目的的测试使用提供了有效性证据。在八个口语标准中，语法被认为是量表上最难的标准，也是表现出最大偏见的标准。本研究强调了MFRM分析在评价员绩效研究中的价值，并对评价员培训产生了启示。本研究是在加拿大背景下使用MFRM与clb相关的高风险评估的首批研究之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Examining Rater Performance on the CELBAN Speaking: A Many-Facets Rasch Measurement Analysis

Internationally educated nurses’ (IENs) English language proficiency is critical to professional licensure as communication is a key competency for safe practice. The Canadian English Language Benchmark Assessment for Nurses (CELBAN) is Canada’s only Canadian Language Benchmarks (CLB) referenced examination used in the context of healthcare regulation. This high-stakes assessment claims proof of proficiency for IENs seeking licensure in Canada and a measure of public safety for nursing regulators. Understanding the quality of rater performance when examination results are used for high-stakes decisions is crucial to maintaining speaking test quality as it involves judgement, and thus requires strong reliability evidence (Koizumi et al., 2017). This study examined rater performance on the CELBAN Speaking component using a Many-Facets Rasch Measurement (MFRM). Specifically, this study identified CELBAN rater reliability in terms of consistency and severity, rating bias, and use of rating scale. The study was based on a sample of 115 raters across eight test sites in Canada and results on 2698 examinations across four parallel versions. Findings demonstrated relatively high inter-rater reliability and intra-rater reliability, and that CLB-based speaking descriptors (CLB 6-9) provided sufficient information for raters to discriminate examinees’ oral proficiency. There was no influence of test site or test version, offering validity evidence to support test use for high-stakes purposes. Grammar, among the eight speaking criteria, was identified as the most difficult criterion on the scale, and the one demonstrating most rater bias. This study highlights the value of MFRM analysis in rater performance research with implications for rater training. This study is one of the first research studies using MFRM with a CLB-referenced high-stakes assessment within the Canadian context.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Canadian Journal of Applied Linguistics LANGUAGE & LINGUISTICS-

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

52 weeks