{"title":"检验TestDaF写作和口语评估中的严重性和中心性效应:一种扩展的贝叶斯多方面Rasch分析","authors":"T. Eckes, K. Jin","doi":"10.1080/15305058.2021.1963260","DOIUrl":null,"url":null,"abstract":"Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Examining severity and centrality effects in TestDaF writing and speaking assessments: An extended Bayesian many-facet Rasch analysis\",\"authors\":\"T. Eckes, K. Jin\",\"doi\":\"10.1080/15305058.2021.1963260\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.\",\"PeriodicalId\":46615,\"journal\":{\"name\":\"International Journal of Testing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2021-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Testing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/15305058.2021.1963260\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"SOCIAL SCIENCES, INTERDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Testing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/15305058.2021.1963260","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOCIAL SCIENCES, INTERDISCIPLINARY","Score":null,"Total":0}
Examining severity and centrality effects in TestDaF writing and speaking assessments: An extended Bayesian many-facet Rasch analysis
Abstract Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang’s (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing and speaking assessments using Bayesian MCMC methods. The findings revealed that (a) the extended facets model had a better data–model fit than models that ignored either or both kinds of rater effects, (b) rating scale and partial credit versions of the extended model differed in terms of data–model fit for writing and speaking, (c) rater severity and centrality estimates were not significantly correlated with each other, and (d) centrality effects had a demonstrable impact on examinee rank orderings. The discussion focuses on implications for the analysis and evaluation of rating quality in performance assessments.