第二语言写作评估中的比较判断：在众包、社区驱动和训练有素的评判员群体中的信度和效度

IF 4.2 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Assessing Writing Pub Date : 2025-04-04 DOI:10.1016/j.asw.2025.100937

Peter Thwaites , Pauline Jadoulle , Magali Paquot

{"title":"第二语言写作评估中的比较判断：在众包、社区驱动和训练有素的评判员群体中的信度和效度","authors":"Peter Thwaites , Pauline Jadoulle , Magali Paquot","doi":"10.1016/j.asw.2025.100937","DOIUrl":null,"url":null,"abstract":"<div><div>Several recent studies have explored the use of comparative judgement for assessing second language writing. One of the claimed advantages of this method is that it generates valid assessments even when judgements are conducted by individuals outside of the traditional language assessment community. However, evidence in support of this claim largely focuses on concurrent validity – i.e. the extent to which CJ rating scales generated by various groups of judges correlate with rubric-based assessments. Little evidence exists of the construct validity of using CJ for L2 writing assessment. The present study seeks to address this by exploring what judges pay attention to while making comparative judgements. Three distinct groups of judges assessed the same set of 25 English L2 argumentative essays, leaving comments after each of their decisions. These comments were then analysed in order to explore the construct relevance and construct representativeness of each judge group’s rating scale. The results suggest that these scales differ in the extent to which they can be considered valid assessments of the target essays.</div></div>","PeriodicalId":46865,"journal":{"name":"Assessing Writing","volume":"65 ","pages":"Article 100937"},"PeriodicalIF":4.2000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative judgment in L2 writing assessment: Reliability and validity across crowdsourced, community-driven, and trained rater groups of judges\",\"authors\":\"Peter Thwaites , Pauline Jadoulle , Magali Paquot\",\"doi\":\"10.1016/j.asw.2025.100937\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Several recent studies have explored the use of comparative judgement for assessing second language writing. One of the claimed advantages of this method is that it generates valid assessments even when judgements are conducted by individuals outside of the traditional language assessment community. However, evidence in support of this claim largely focuses on concurrent validity – i.e. the extent to which CJ rating scales generated by various groups of judges correlate with rubric-based assessments. Little evidence exists of the construct validity of using CJ for L2 writing assessment. The present study seeks to address this by exploring what judges pay attention to while making comparative judgements. Three distinct groups of judges assessed the same set of 25 English L2 argumentative essays, leaving comments after each of their decisions. These comments were then analysed in order to explore the construct relevance and construct representativeness of each judge group’s rating scale. The results suggest that these scales differ in the extent to which they can be considered valid assessments of the target essays.</div></div>\",\"PeriodicalId\":46865,\"journal\":{\"name\":\"Assessing Writing\",\"volume\":\"65 \",\"pages\":\"Article 100937\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Assessing Writing\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1075293525000248\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Assessing Writing","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1075293525000248","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

最近的几项研究探索了使用比较判断来评估第二语言写作。这种方法的优点之一是，即使判断是由传统语言评估社区之外的个人进行的，它也能产生有效的评估。然而，支持这一说法的证据主要集中在并发效度上，即由不同法官群体产生的CJ评定量表与基于规则的评估之间的关联程度。很少有证据表明使用CJ进行第二语言写作评估的结构效度。本研究试图通过探讨法官在进行比较判断时注意什么来解决这个问题。三组不同的评委评估同一组25篇英语L2议论文，在他们的每一个决定后留下评论。然后对这些评论进行分析，以探索每个评委组的评分量表的结构相关性和结构代表性。结果表明，这些尺度的不同程度上，他们可以被认为是目标论文的有效评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative judgment in L2 writing assessment: Reliability and validity across crowdsourced, community-driven, and trained rater groups of judges

Several recent studies have explored the use of comparative judgement for assessing second language writing. One of the claimed advantages of this method is that it generates valid assessments even when judgements are conducted by individuals outside of the traditional language assessment community. However, evidence in support of this claim largely focuses on concurrent validity – i.e. the extent to which CJ rating scales generated by various groups of judges correlate with rubric-based assessments. Little evidence exists of the construct validity of using CJ for L2 writing assessment. The present study seeks to address this by exploring what judges pay attention to while making comparative judgements. Three distinct groups of judges assessed the same set of 25 English L2 argumentative essays, leaving comments after each of their decisions. These comments were then analysed in order to explore the construct relevance and construct representativeness of each judge group’s rating scale. The results suggest that these scales differ in the extent to which they can be considered valid assessments of the target essays.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Assessing Writing Multiple-

CiteScore

6.00

自引率

17.90%

发文量

期刊介绍： Assessing Writing is a refereed international journal providing a forum for ideas, research and practice on the assessment of written language. Assessing Writing publishes articles, book reviews, conference reports, and academic exchanges concerning writing assessments of all kinds, including traditional (direct and standardised forms of) testing of writing, alternative performance assessments (such as portfolios), workplace sampling and classroom assessment. The journal focuses on all stages of the writing assessment process, including needs evaluation, assessment creation, implementation, and validation, and test development.