{"title":"第二语言写作评估中的比较判断:在众包、社区驱动和训练有素的评判员群体中的信度和效度","authors":"Peter Thwaites , Pauline Jadoulle , Magali Paquot","doi":"10.1016/j.asw.2025.100937","DOIUrl":null,"url":null,"abstract":"<div><div>Several recent studies have explored the use of comparative judgement for assessing second language writing. One of the claimed advantages of this method is that it generates valid assessments even when judgements are conducted by individuals outside of the traditional language assessment community. However, evidence in support of this claim largely focuses on concurrent validity – i.e. the extent to which CJ rating scales generated by various groups of judges correlate with rubric-based assessments. Little evidence exists of the construct validity of using CJ for L2 writing assessment. The present study seeks to address this by exploring what judges pay attention to while making comparative judgements. Three distinct groups of judges assessed the same set of 25 English L2 argumentative essays, leaving comments after each of their decisions. These comments were then analysed in order to explore the construct relevance and construct representativeness of each judge group’s rating scale. The results suggest that these scales differ in the extent to which they can be considered valid assessments of the target essays.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"65 ","pages":"Article 100937"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative judgment in L2 writing assessment: Reliability and validity across crowdsourced, community-driven, and trained rater groups of judges\",\"authors\":\"Peter Thwaites , Pauline Jadoulle , Magali Paquot\",\"doi\":\"10.1016/j.asw.2025.100937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Several recent studies have explored the use of comparative judgement for assessing second language writing. One of the claimed advantages of this method is that it generates valid assessments even when judgements are conducted by individuals outside of the traditional language assessment community. However, evidence in support of this claim largely focuses on concurrent validity – i.e. the extent to which CJ rating scales generated by various groups of judges correlate with rubric-based assessments. Little evidence exists of the construct validity of using CJ for L2 writing assessment. The present study seeks to address this by exploring what judges pay attention to while making comparative judgements. Three distinct groups of judges assessed the same set of 25 English L2 argumentative essays, leaving comments after each of their decisions. These comments were then analysed in order to explore the construct relevance and construct representativeness of each judge group’s rating scale. The results suggest that these scales differ in the extent to which they can be considered valid assessments of the target essays.</div></div>\",\"PeriodicalId\":46865,\"journal\":{\"name\":\"Assessing Writing\",\"volume\":\"65 \",\"pages\":\"Article 100937\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Assessing Writing\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1075293525000248\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293525000248","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
Comparative judgment in L2 writing assessment: Reliability and validity across crowdsourced, community-driven, and trained rater groups of judges
Several recent studies have explored the use of comparative judgement for assessing second language writing. One of the claimed advantages of this method is that it generates valid assessments even when judgements are conducted by individuals outside of the traditional language assessment community. However, evidence in support of this claim largely focuses on concurrent validity – i.e. the extent to which CJ rating scales generated by various groups of judges correlate with rubric-based assessments. Little evidence exists of the construct validity of using CJ for L2 writing assessment. The present study seeks to address this by exploring what judges pay attention to while making comparative judgements. Three distinct groups of judges assessed the same set of 25 English L2 argumentative essays, leaving comments after each of their decisions. These comments were then analysed in order to explore the construct relevance and construct representativeness of each judge group’s rating scale. The results suggest that these scales differ in the extent to which they can be considered valid assessments of the target essays.
期刊介绍:
Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.