{"title":"Examining severity and centrality effects in TestDaF writing and speaking assessments: An extended Bayesian many-facet Rasch analysis","authors":"T. Eckes, K. Jin","doi":"10.1080/15305058.2021.1963260","DOIUrl":null,"url":null,"abstract":"Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Testing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15305058.2021.1963260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}
引用次数: 6
Abstract
Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.