{"title":"Benchmark rating procedure, best of both worlds? Comparing procedures to rate text quality in a reliable and valid manner","authors":"Renske Bouwer, M. Koster, H. van den Bergh","doi":"10.1080/0969594X.2023.2241656","DOIUrl":null,"url":null,"abstract":"ABSTRACT Assessing students’ writing performance is essential to adequately monitor and promote individual writing development, but it is also a challenge. The present research investigates a benchmark rating procedure for assessing texts written by upper-elementary students. In two studies we examined whether a benchmark rating procedure (1) leads to reliable and generalisable scores that converge with holistic and analytic ratings, and (2) can be used for rating texts varying in topic and genre. Results support evidence that benchmark ratings are a valid indicator of text quality as they converge with holistic and analytic scores. They are also associated with less rater variance and less task-specific variance, leading to reliable and generalisable ratings. Moreover, a benchmark scale can be used for rating different tasks with the same reliability, at least when texts are written in the same genre. Taken together, a benchmark rating procedure ensures meaningful and useful information on students’ writing.","PeriodicalId":51515,"journal":{"name":"Assessment in Education-Principles Policy & Practice","volume":null,"pages":null},"PeriodicalIF":2.7000,"publicationDate":"2023-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessment in Education-Principles Policy & Practice","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/0969594X.2023.2241656","RegionNum":3,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 5
Abstract
ABSTRACT Assessing students’ writing performance is essential to adequately monitor and promote individual writing development, but it is also a challenge. The present research investigates a benchmark rating procedure for assessing texts written by upper-elementary students. In two studies we examined whether a benchmark rating procedure (1) leads to reliable and generalisable scores that converge with holistic and analytic ratings, and (2) can be used for rating texts varying in topic and genre. Results support evidence that benchmark ratings are a valid indicator of text quality as they converge with holistic and analytic scores. They are also associated with less rater variance and less task-specific variance, leading to reliable and generalisable ratings. Moreover, a benchmark scale can be used for rating different tasks with the same reliability, at least when texts are written in the same genre. Taken together, a benchmark rating procedure ensures meaningful and useful information on students’ writing.
期刊介绍:
Recent decades have witnessed significant developments in the field of educational assessment. New approaches to the assessment of student achievement have been complemented by the increasing prominence of educational assessment as a policy issue. In particular, there has been a growth of interest in modes of assessment that promote, as well as measure, standards and quality. These have profound implications for individual learners, institutions and the educational system itself. Assessment in Education provides a focus for scholarly output in the field of assessment. The journal is explicitly international in focus and encourages contributions from a wide range of assessment systems and cultures. The journal''s intention is to explore both commonalities and differences in policy and practice.