Language Testing最新文献_第3页

Making each point count: Revising a local adaptation of the Jacobs et al. (1981) ESL COMPOSITION PROFILE rubric 让每一分都有价值：修订雅各布斯等人（1981 年）的英语写作概貌评分标准的地方改编版

IF 4.1 1区文学

Language Testing Pub Date : 2023-12-30 DOI: 10.1177/02655322231217979

Yu-Tzu Chang, Ann Tai Choe, Daniel Holden, Daniel R. Isbell

{"title":"Making each point count: Revising a local adaptation of the Jacobs et al. (1981) ESL COMPOSITION PROFILE rubric","authors":"Yu-Tzu Chang, Ann Tai Choe, Daniel Holden, Daniel R. Isbell","doi":"10.1177/02655322231217979","DOIUrl":"https://doi.org/10.1177/02655322231217979","url":null,"abstract":"In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al. (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016–2021, including 434 essays) using many-facet Rasch measurement demonstrated that the 20-point rating scales of the Jacobs et al. rubric functioned poorly due to (a) questionably small distinctions in writing quality between successive score categories and (b) the presence of several disordered categories. We reanalyzed the score data after collapsing the 20-point scales into 4-point scales to simulate a revision to the rubric. This reanalysis appeared promising, with well-ordered and distinct score categories, and only a trivial decrease in person separation reliability. After implementing this revision to the rubric, we examined data from recent administrations (2022–2023, including 93 essays) to evaluate scale functioning. As in the simulation, scale categories were well-ordered and distinct in operational rating. Moreover, no raters demonstrated exceedingly poor fit using the revised rubric. Findings hold implications for other programs adopting/adapting the PROFILE or a similar rubric.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" 2","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139140765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with roleplays 比较两种以数据为导向的评分量表格式，以评估课堂上的角色扮演语用表现

IF 4.1 1区文学

Language Testing Pub Date : 2023-11-29 DOI: 10.1177/02655322231210217

Yunwen Su, Sun-Young Shin

{"title":"Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with roleplays","authors":"Yunwen Su, Sun-Young Shin","doi":"10.1177/02655322231210217","DOIUrl":"https://doi.org/10.1177/02655322231210217","url":null,"abstract":"Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful diagnostic information, and exhibit robust content representativeness and stable measurement properties. This sequential mixed methods study compares two data-driven rating scales with multiple criteria that use different formats for pragmatic performance. They were developed using roleplays performed by 43 second-language learners of Mandarin—the hierarchical-binary (HB) scale, developed through close analysis of performance data, and the multi-trait (MT) scale derived from the HB, which has the same criteria but takes the format of an analytic scale. Results revealed the influence of format, albeit to a limited extent: MT showed a marginal advantage over HB in terms of overall reliability, practicality, and discriminatory power, though measurement properties of the two scales were largely comparable. All raters were positive about the pedagogical value of both scales. This study reveals that rater perceptions of the ease of use and effectiveness of both scales provide further insights into scale functioning.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"52 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139210387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Triangulating NLP-based analysis of rater comments and MFRM: An innovative approach to investigating raters’ application of rating scales in writing assessment 基于 NLP 的评分者评语分析和多指标评分法（MFRM）的三角分析：调查评分者在写作评估中应用评分量表的创新方法

IF 4.1 1区文学

Language Testing Pub Date : 2023-11-29 DOI: 10.1177/02655322231210231

Huiying Cai, Xun Yan

{"title":"Triangulating NLP-based analysis of rater comments and MFRM: An innovative approach to investigating raters’ application of rating scales in writing assessment","authors":"Huiying Cai, Xun Yan","doi":"10.1177/02655322231210231","DOIUrl":"https://doi.org/10.1177/02655322231210231","url":null,"abstract":"Rater comments tend to be qualitatively analyzed to indicate raters’ application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The data consisted of ratings on 987 essays by 36 raters (a total of 3948 analytic scores and 1974 rater comments) on a post-admission English Placement Test (EPT) at a large US university. We computed a set of comment-based features based on the analytic components and evaluative language the raters used to infer whether raters were aligned to the scale. For data triangulation, we performed correlation analyses between the MFRM measures of rater performance and the comment-based measures. Although the EPT raters showed overall satisfactory performance, we found meaningful associations between rater comments and performance features. In particular, raters with higher precision and fit to what the Rasch model predicts used more analytic components and used evaluative language more similar to the scale descriptors. These findings suggest that NLP techniques have the potential to help language testers analyze rater comments and understand rater behavior.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"2 1","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139212101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction Note 更正说明

IF 4.1 1区文学

Language Testing Pub Date : 2023-11-20 DOI: 10.1177/02655322231211735

引用次数: 0

Argument-based validation of Academic Collocation Tests 基于论证的学术搭配测试验证

1区文学

Language Testing Pub Date : 2023-10-21 DOI: 10.1177/02655322231198499

Thi My Hang Nguyen, Peter Gu, Averil Coxhead

{"title":"Argument-based validation of Academic Collocation Tests","authors":"Thi My Hang Nguyen, Peter Gu, Averil Coxhead","doi":"10.1177/02655322231198499","DOIUrl":"https://doi.org/10.1177/02655322231198499","url":null,"abstract":"Despite extensive research on assessing collocational knowledge, valid measures of academic collocations remain elusive. With the present study, we begin an argument-based approach to validate two Academic Collocation Tests (ACTs) that assess the ability to recognize and produce academic collocations (i.e., two-word units such as key element and well established) in written contexts. A total of 343 tertiary students completed a background questionnaire (including demographic information, IELTS scores, and learning experience), the ACTs, and the Vocabulary Size Test. Forty-four participants also took part in post-test interviews to share reflections on the tests and retook the ACTs verbally. The findings showed that the scoring inference based on analyses of test item characteristics, testing conditions, and scoring procedures was partially supported. The generalization inference, based on the consistency of item measures and testing occasions, was justified. The extrapolation inference, drawn from correlations with other measures and factors such as collocation frequency and learning experience, received partial support. Suggestions for increasing the degree of support for the inferences are discussed. The present study reinforces the value of validation research and generates the momentum for test developers to continue this practice with other vocabulary tests.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135512996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Revisiting raters’ accent familiarity in speaking tests: Evidence that presentation mode interacts with accent familiarity to variably affect comprehensibility ratings 重新审视评分者在口语测试中的口音熟悉度:陈述方式与口音熟悉度相互作用对可理解性评分产生可变影响的证据

1区文学

Language Testing Pub Date : 2023-10-14 DOI: 10.1177/02655322231200808

Michael D. Carey, Stefan Szocs

{"title":"Revisiting raters’ accent familiarity in speaking tests: Evidence that presentation mode interacts with accent familiarity to variably affect comprehensibility ratings","authors":"Michael D. Carey, Stefan Szocs","doi":"10.1177/02655322231200808","DOIUrl":"https://doi.org/10.1177/02655322231200808","url":null,"abstract":"This controlled experimental study investigated the interaction of variables associated with rating the pronunciation component of high-stakes English-language-speaking tests such as IELTS and TOEFL iBT. One hundred experienced raters who were all either familiar or unfamiliar with Brazilian-accented English or Papua New Guinean Tok Pisin-accented English, respectively, were presented with speech samples in audio-only or audio-visual mode. Two-way ordinal regression with post hoc pairwise comparisons found that the presentation mode interacted significantly with accent familiarity to increase comprehensibility ratings (χ² = 88.005, df = 3, p < .0001), with presentation mode having a stronger effect in the interaction than accent familiarity (χ² = 59.328, df = 1, p < .0001). Based on odds ratios, raters were significantly more likely to score comprehensibility higher when the presentation mode was audio-visual (compared to audio-only) for both the unfamiliar (91% more likely) and familiar speakers (92.3% more likely). The results suggest that semi-direct speaking tests using audio-only or audio-visual modes of presentation should be evaluated through research to ascertain how accent familiarity and presentation mode interact to variably affect comprehensibility ratings. Such research may be beneficial to investigate the virtual modes of speaking test delivery that have emerged post-COVID-19.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135804141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Our validity looks like justice. Does yours? 我们的有效性看起来像正义。是你的吗?

1区文学

Language Testing Pub Date : 2023-10-07 DOI: 10.1177/02655322231202947

Jennifer Randall, Mya Poe, David Slomp, Maria Elena Oliveri

引用次数: 0

Language assessment accommodations: Issues and challenges for the future 语言评估设施:未来的问题和挑战

1区文学

Language Testing Pub Date : 2023-10-01 DOI: 10.1177/02655322231186222

Lynda Taylor, Jayanti Banerjee

引用次数: 0

Accommodations in language testing and assessment: Safeguarding equity, access, and inclusion 语言测试和评估的便利:维护公平、获取和包容