在绩效评估中检测较高的中心性效应:一种基于模型的中心性指数比较

IF 0.6 Q3 SOCIAL SCIENCES, INTERDISCIPLINARY

Measurement-Interdisciplinary Research and Perspectives Pub Date : 2022-10-02 DOI:10.1080/15366367.2021.1972654

K. Jin, T. Eckes

{"title":"在绩效评估中检测较高的中心性效应:一种基于模型的中心性指数比较","authors":"K. Jin, T. Eckes","doi":"10.1080/15366367.2021.1972654","DOIUrl":null,"url":null,"abstract":"ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.","PeriodicalId":46596,"journal":{"name":"Measurement-Interdisciplinary Research and Perspectives","volume":"55 1","pages":"228 - 247"},"PeriodicalIF":0.6000,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Detecting Rater Centrality Effects in Performance Assessments: A Model-Based Comparison of Centrality Indices\",\"authors\":\"K. Jin, T. Eckes\",\"doi\":\"10.1080/15366367.2021.1972654\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.\",\"PeriodicalId\":46596,\"journal\":{\"name\":\"Measurement-Interdisciplinary Research and Perspectives\",\"volume\":\"55 1\",\"pages\":\"228 - 247\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement-Interdisciplinary Research and Perspectives\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/15366367.2021.1972654\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SOCIAL SCIENCES, INTERDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement-Interdisciplinary Research and Perspectives","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15366367.2021.1972654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}

引用次数: 6

摘要

最近对绩效评估中评分者效应的研究越来越关注评分者的中心性，即在评分量表的中间类别周围分配分数的趋势。在本文中，我们采用Jin和Wang(2018)的扩展面建模方法，构建了一个中心性连续体，范围从表现出强烈集中倾向的评分者到表现出强烈反方向(极端)倾向的评分者。在两项模拟研究中，我们检查了三种基于模型的中心性检测指标(评分不全统计量、残差预期相关性和评分阈值SD)以及原始评分SD在重建真实评分中心性排名顺序方面的效率。研究结果证实了残差期望相关、评分阈值SD和原始评分SD统计的优越性，特别是在考生样本量大、评分标准数量多的情况下。相比之下，infit统计结果不太一致，并且在标准难度之间存在较大差异的情况下，对评分者的集中倾向提出了错误的结论。通过对大规模演讲绩效评估的真实评分数据的分析，证实了infit统计不适合用于识别评分者的中心倾向。讨论的重点是在不同方面模型下检测中心性效应以及指数对评分监测和公平绩效评估的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detecting Rater Centrality Effects in Performance Assessments: A Model-Based Comparison of Centrality Indices

ABSTRACT Recent research on rater effects in performance assessments has increasingly focused on rater centrality, the tendency to assign scores clustering around the rating scale’s middle categories. In the present paper, we adopted Jin and Wang’s (2018) extended facets modeling approach and constructed a centrality continuum, ranging from raters exhibiting strong central tendencies to raters exhibiting strong tendencies in the opposite direction (extremity). In two simulation studies, we examined three model-based centrality detection indices (rater infit statistics, residual–expected correlations, and rater threshold SDs) as well as the raw-score SD in terms of their efficiency of reconstructing the true rater centrality rank ordering. Findings confirmed the superiority of the residual–expected correlation, rater threshold SD, and raw-score SD statistics, particularly when the examinee sample size was large and the number of scoring criteria was high. By contrast, the infit statistic results were much less consistent and, under conditions of large differences between criterion difficulties, suggested erroneous conclusions about raters’ central tendencies. Analyzing real rating data from a large-scale speaking performance assessment confirmed that infit statistics are unsuitable for identifying raters’ central tendencies. The discussion focuses on detecting centrality effects under different facets models and the indices’ implications for rater monitoring and fair performance assessment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Measurement-Interdisciplinary Research and Perspectives SOCIAL SCIENCES, INTERDISCIPLINARY-

CiteScore

1.80

自引率

0.00%

发文量