A sequential approach to detecting differential rater functioning in sparse rater-mediated assessment networks

IF 2.4 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing Pub Date : 2022-05-12 DOI:10.1177/02655322221092388

Stefanie A. Wind

{"title":"A sequential approach to detecting differential rater functioning in sparse rater-mediated assessment networks","authors":"Stefanie A. Wind","doi":"10.1177/02655322221092388","DOIUrl":null,"url":null,"abstract":"Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting DRF may be limited in sparse rating designs, where it is not possible for every rater to score every student. In these designs, there is limited information with which to detect DRF. Sparse designs can also exacerbate the impact of artificial DRF, which occurs when raters are inaccurately flagged for DRF due to statistical artifacts. In this study, a sequential method is adapted from previous research on differential item functioning (DIF) that allows researchers to detect DRF more accurately and distinguish between true and artificial DRF. Analyses of data from a rater-mediated writing assessment and a simulation study demonstrate that the sequential approach results in different conclusions about which raters exhibit DRF. Moreover, the simulation study results suggest that the sequential procedure results in improved accuracy in DRF detection across a variety of rating design conditions. Practical implications for language testing research are discussed.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"209 - 226"},"PeriodicalIF":2.4000,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Testing","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/02655322221092388","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting DRF may be limited in sparse rating designs, where it is not possible for every rater to score every student. In these designs, there is limited information with which to detect DRF. Sparse designs can also exacerbate the impact of artificial DRF, which occurs when raters are inaccurately flagged for DRF due to statistical artifacts. In this study, a sequential method is adapted from previous research on differential item functioning (DIF) that allows researchers to detect DRF more accurately and distinguish between true and artificial DRF. Analyses of data from a rater-mediated writing assessment and a simulation study demonstrate that the sequential approach results in different conclusions about which raters exhibit DRF. Moreover, the simulation study results suggest that the sequential procedure results in improved accuracy in DRF detection across a variety of rating design conditions. Practical implications for language testing research are discussed.

查看原文本刊更多论文

在稀疏评分器介导的评估网络中检测差分评分器功能的顺序方法

研究人员经常在成绩评估中评估评分者的判断，以寻找差异评分者功能（DRF）的证据，当评分者的严重程度在控制学生成绩水平后与构建不相关的学生特征系统相关时，就会发生这种情况。然而，研究人员观察到，在稀疏评分设计中，检测DRF的方法可能受到限制，因为不可能每个评分者都为每个学生打分。在这些设计中，用于检测DRF的信息有限。稀疏设计也会加剧人工DRF的影响，当评分者由于统计伪影而被错误地标记为DRF时，就会发生这种情况。在这项研究中，一种序列方法改编自先前对差异项目功能（DIF）的研究，使研究人员能够更准确地检测DRF，并区分真实和人工DRF。对评分者介导的写作评估和模拟研究的数据分析表明，顺序方法会导致评分者表现出DRF的不同结论。此外，模拟研究结果表明，在各种额定设计条件下，顺序程序提高了DRF检测的准确性。讨论了语言测试研究的实际意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Language Testing Multiple-

CiteScore

6.70

自引率

9.80%

发文量

期刊介绍： Language Testing is a fully peer reviewed international journal that publishes original research and review articles on language testing and assessment. It provides a forum for the exchange of ideas and information between people working in the fields of first and second language testing and assessment. This includes researchers and practitioners in EFL and ESL testing, and assessment in child language acquisition and language pathology. In addition, special attention is focused on issues of testing theory, experimental investigations, and the following up of practical implications.