Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring

IF 1.1 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education Pub Date : 2020-07-02 DOI:10.1080/08957347.2020.1750406

I. Bejar, Chen Li, D. McCaffrey

{"title":"Predictive Modeling of Rater Behavior: Implications for Quality Assurance in Essay Scoring","authors":"I. Bejar, Chen Li, D. McCaffrey","doi":"10.1080/08957347.2020.1750406","DOIUrl":null,"url":null,"abstract":"ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater, the linear regression of rater scores on the linguistic attributes is obtained based on data from two consecutive time periods. The regression from each period was cross validated against data from the other period. Raters were characterized in terms of their level of predictability and the importance of the predictors. Results suggest that rater models capture stable individual differences among raters. To evaluate the feasibility of using rater models as a quality control mechanism, we evaluated the relationship between rater predictability and inter-rater agreement and performance on pre-scored essays. Finally, we conducted a simulation whereby raters are simulated to score exclusively as a function of essay length at different points during the scoring day. We concluded that predictive rater models merit further investigation as a means of quality controlling human scoring.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"33 1","pages":"234 - 247"},"PeriodicalIF":1.1000,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/08957347.2020.1750406","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Measurement in Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/08957347.2020.1750406","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 1

Abstract

ABSTRACT We evaluate the feasibility of developing predictive models of rater behavior, that is, rater-specific models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays used by the e-rater® engine. Specifically, for each rater, the linear regression of rater scores on the linguistic attributes is obtained based on data from two consecutive time periods. The regression from each period was cross validated against data from the other period. Raters were characterized in terms of their level of predictability and the importance of the predictors. Results suggest that rater models capture stable individual differences among raters. To evaluate the feasibility of using rater models as a quality control mechanism, we evaluated the relationship between rater predictability and inter-rater agreement and performance on pre-scored essays. Finally, we conducted a simulation whereby raters are simulated to score exclusively as a function of essay length at different points during the scoring day. We concluded that predictive rater models merit further investigation as a means of quality controlling human scoring.

查看原文本刊更多论文

评分者行为的预测模型：对论文评分质量保证的启示

摘要：我们评估了开发评分者行为预测模型的可行性，即评分者特定模型，用于预测评分者在操作条件下产生的分数。在本研究中，因变量是评分者分配给论文的分数，预测因素是e-rater®引擎使用的论文的语言属性。具体而言，对于每个评分者，基于来自两个连续时间段的数据来获得评分者对语言属性的得分的线性回归。每个时期的回归与其他时期的数据进行交叉验证。评分者的特点是他们的可预测性水平和预测因素的重要性。结果表明，评分者模型捕捉到了评分者之间稳定的个体差异。为了评估使用评分者模型作为质量控制机制的可行性，我们评估了评分者的可预测性与评分者之间的一致性以及预评分论文的表现之间的关系。最后，我们进行了一个模拟，模拟评分者在评分日的不同时间点只根据文章长度进行评分。我们得出的结论是，作为一种质量控制人类评分的手段，预测评分模型值得进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Measurement in Education Multiple-

CiteScore

2.50

自引率

13.30%

发文量

期刊介绍： Because interaction between the domains of research and application is critical to the evaluation and improvement of new educational measurement practices, Applied Measurement in Education" prime objective is to improve communication between academicians and practitioners. To help bridge the gap between theory and practice, articles in this journal describe original research studies, innovative strategies for solving educational measurement problems, and integrative reviews of current approaches to contemporary measurement issues. Peer Review Policy: All review papers in this journal have undergone editorial screening and peer review.