基于 NLP 的评分者评语分析和多指标评分法（MFRM）的三角分析：调查评分者在写作评估中应用评分量表的创新方法

IF 2.4 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing Pub Date : 2023-11-29 DOI:10.1177/02655322231210231

Huiying Cai, Xun Yan

{"title":"基于 NLP 的评分者评语分析和多指标评分法（MFRM）的三角分析：调查评分者在写作评估中应用评分量表的创新方法","authors":"Huiying Cai, Xun Yan","doi":"10.1177/02655322231210231","DOIUrl":null,"url":null,"abstract":"Rater comments tend to be qualitatively analyzed to indicate raters’ application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The data consisted of ratings on 987 essays by 36 raters (a total of 3948 analytic scores and 1974 rater comments) on a post-admission English Placement Test (EPT) at a large US university. We computed a set of comment-based features based on the analytic components and evaluative language the raters used to infer whether raters were aligned to the scale. For data triangulation, we performed correlation analyses between the MFRM measures of rater performance and the comment-based measures. Although the EPT raters showed overall satisfactory performance, we found meaningful associations between rater comments and performance features. In particular, raters with higher precision and fit to what the Rasch model predicts used more analytic components and used evaluative language more similar to the scale descriptors. These findings suggest that NLP techniques have the potential to help language testers analyze rater comments and understand rater behavior.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"2 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Triangulating NLP-based analysis of rater comments and MFRM: An innovative approach to investigating raters’ application of rating scales in writing assessment\",\"authors\":\"Huiying Cai, Xun Yan\",\"doi\":\"10.1177/02655322231210231\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rater comments tend to be qualitatively analyzed to indicate raters’ application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The data consisted of ratings on 987 essays by 36 raters (a total of 3948 analytic scores and 1974 rater comments) on a post-admission English Placement Test (EPT) at a large US university. We computed a set of comment-based features based on the analytic components and evaluative language the raters used to infer whether raters were aligned to the scale. For data triangulation, we performed correlation analyses between the MFRM measures of rater performance and the comment-based measures. Although the EPT raters showed overall satisfactory performance, we found meaningful associations between rater comments and performance features. In particular, raters with higher precision and fit to what the Rasch model predicts used more analytic components and used evaluative language more similar to the scale descriptors. These findings suggest that NLP techniques have the potential to help language testers analyze rater comments and understand rater behavior.\",\"PeriodicalId\":17928,\"journal\":{\"name\":\"Language Testing\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language Testing\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1177/02655322231210231\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Testing","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/02655322231210231","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

摘要

评分者的评语往往通过定性分析来说明评分者对评分量表的应用。本研究应用自然语言处理（NLP）技术，从评分者评语语料库中量化有意义的行为信息，并将这些信息与评分者评分的多方面拉施测量（MFRM）分析进行三角测量。数据包括 36 位评分者对美国一所大型大学入学后英语分级测试 (EPT) 中 987 篇文章的评分（共计 3948 个分析分数和 1974 个评分者评语）。我们根据评分者使用的分析成分和评价语言计算出了一套基于评论的特征，以推断评分者是否与量表一致。为了对数据进行三角测量，我们对评分者绩效的 MFRM 测量和基于评论的测量进行了相关分析。尽管 EPT 评分员的总体表现令人满意，但我们发现评分员的评语与表现特征之间存在有意义的关联。特别是，精确度较高且符合拉施模型预测的评分者使用了更多的分析成分，并使用了与量表描述符更为相似的评价性语言。这些发现表明，NLP 技术有可能帮助语言测试人员分析评分者的评语并理解评分者的行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Triangulating NLP-based analysis of rater comments and MFRM: An innovative approach to investigating raters’ application of rating scales in writing assessment

Rater comments tend to be qualitatively analyzed to indicate raters’ application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The data consisted of ratings on 987 essays by 36 raters (a total of 3948 analytic scores and 1974 rater comments) on a post-admission English Placement Test (EPT) at a large US university. We computed a set of comment-based features based on the analytic components and evaluative language the raters used to infer whether raters were aligned to the scale. For data triangulation, we performed correlation analyses between the MFRM measures of rater performance and the comment-based measures. Although the EPT raters showed overall satisfactory performance, we found meaningful associations between rater comments and performance features. In particular, raters with higher precision and fit to what the Rasch model predicts used more analytic components and used evaluative language more similar to the scale descriptors. These findings suggest that NLP techniques have the potential to help language testers analyze rater comments and understand rater behavior.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Language Testing Multiple-

CiteScore

6.70

自引率

9.80%

发文量

期刊介绍： Language Testing is a fully peer reviewed international journal that publishes original research and review articles on language testing and assessment. It provides a forum for the exchange of ideas and information between people working in the fields of first and second language testing and assessment. This includes researchers and practitioners in EFL and ESL testing, and assessment in child language acquisition and language pathology. In addition, special attention is focused on issues of testing theory, experimental investigations, and the following up of practical implications.