Studies in Language Assessment最新文献_第5页

Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet 评量ETIC进阶写作任务的评量判断:概括性理论与多面向的应用

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/vmak1620

Jiayu Wang, Kaizhou Luo

{"title":"Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet","authors":"Jiayu Wang, Kaizhou Luo","doi":"10.58379/vmak1620","DOIUrl":"https://doi.org/10.58379/vmak1620","url":null,"abstract":"Developed by China Language Assessment (CLA), the English Test for International Communication Advanced (ETIC Advanced) assesses one’s ability to perform English language tasks in international workplace contexts. ETIC Advanced is only composed of writing and speaking tasks, featured with authentic constructed response format. However, the elicitation of extended responses from candidates would call for human raters to make judgments, thus raising a critical issue of rating quality. This study aimed to evaluate rater judgements on the writing tasks of ETIC Advanced. Data in the study represented scores from 186 candidates who performed all writing tasks: Letter Writing, Report Writing, and Proposal Writing (n=3,348 ratings). Rating was conducted by six certified raters based on a six-point three-category analytical rating scale. Generalizability theory (GT) and Many-Facets Rasch Model (MFRM) were applied to analyse the scores from different perspectives. Results from GT indicated that raters’ inconsistency and interaction with other aspects resulted in a relatively low proportion of overall score variance, and that the ratings sufficed for generalization. MFRM analysis revealed that the six raters differed significantly in severity, yet remained consistent in their own judgements. Bias analyses indicated that the raters tended to assign more biased scores to low-proficient candidates and the Content category of rating scale. The study serves to demonstrate the use of both GT and MFRM to evaluate rater judgments on language performance tests. The findings of this study have implications for ETIC rater training.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74964941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

T. McNamara, U. Knoch & J. Fan. Fairness, Justice, and Language Assessment T.麦克纳马拉，U. Knoch和J. Fan。公平、正义和语言评估

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/nrax8588

Troy L. Cox

引用次数: 0

Examination of CEFR-J spoken interaction tasks using many-facet Rasch measurement and generalizability theory 用多面Rasch测量和推广理论检验CEFR-J口语交互任务

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/bswy7332

Rie Koizumi, Emiko Kaneko, E. Setoguchi, Yo In’nami

{"title":"Examination of CEFR-J spoken interaction tasks using many-facet Rasch measurement and generalizability theory","authors":"Rie Koizumi, Emiko Kaneko, E. Setoguchi, Yo In’nami","doi":"10.58379/bswy7332","DOIUrl":"https://doi.org/10.58379/bswy7332","url":null,"abstract":"Attempts are underway to develop prototype tasks, based on a Japanese version of the Common European Framework of Reference for Languages (CEFR; Council of Europe, 2001; CEFR-J; Negishi, Takada, & Tono, 2013). As part of this larger project, the current paper reports on the creation of spoken interaction tasks for five levels (Pre-A1, A1.1, A1.2, A1.3, and A2.1). Tasks were undertaken by 66 Japanese university students. Two raters evaluated their interactions using a three-level holistic rating scale, and 20% of the performances were double rated. The spoken ratings were analysed using many-facet Rasch measurement (MFRM) and generalizability theory (G-theory). MFRM showed that all the tasks fit the Rasch model well, the scale functioned satisfactorily, and the difficulty of the tasks generally concurred with CEFR-J levels. Results from G-theory that employed the p x t design, including tasks as a facet, showed the different proportion of variance accounted for by tasks, as well as the number of tasks that could be required to ensure sufficiently high reliability. The MFRM and G-theory results effectively revealed areas for improving spoken interaction tasks; the results also showed the usefulness of combining the two methods for task development and revision.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74110307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

An investigation of factors involved in Japanese students’ English learning behavior during test preparation 日本学生备考期间英语学习行为的相关因素调查

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/fsbq6351

Takanori Sato

{"title":"An investigation of factors involved in Japanese students’ English learning behavior during test preparation","authors":"Takanori Sato","doi":"10.58379/fsbq6351","DOIUrl":"https://doi.org/10.58379/fsbq6351","url":null,"abstract":"Japan has recently been promoting university entrance examination reform with the goal of positively influencing students’ English learning, but the extent to which entrance examinations themselves affect English learning is not known. The promotion of better learning requires changing the factors that affect learning behavior, rather than merely modifying existing examinations or introducing new ones. This study investigated the factors determining Japanese students’ English learning while they prepared for high-stakes university entrance examinations, aiming to construct a model that explicates how test-related and test-independent factors are intertwined. Semi-structured interviews were conducted with 14 first-year university students asking how they had prepared for their examinations and why they had chosen particular preparation methods. After thematic analysis, four main factors in student learning behavior were identified (examination, student views, school, and examination-independent factors) and their relationships explored. The study findings provide useful insights for policymakers in English as a foreign language (EFL) educational contexts, where English tests are used as part of language education policies. Furthermore, the proposed model is theoretically important as it explains the complex washback mechanism and deepens our understanding of why intended washback effects on learning are not necessarily achieved.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80086381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Benchmarking video presentations for CEFR usage in Cuba 古巴CEFR使用的基准视频演示

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/tvkg6591

Geisa Dávila Pérez, Frank van Splunder, L. Baten, Jan Van Maele, Yoennis Díaz Moreno

引用次数: 0

An investigation into rater performance with a holistic scale and a binary, analytic scale on an ESL writing placement test 用整体量表和二元分析量表对ESL写作分班测试中评分者表现的调查

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/nkdc1529

Hyunji Hayley Park, Xun Yan

{"title":"An investigation into rater performance with a holistic scale and a binary, analytic scale on an ESL writing placement test","authors":"Hyunji Hayley Park, Xun Yan","doi":"10.58379/nkdc1529","DOIUrl":"https://doi.org/10.58379/nkdc1529","url":null,"abstract":"This two-phased, sequential mixed-methods study investigates how raters are influenced by different rating scales on a college-level English as a second language (ESL) writing placement test. In Phase I, nine certified raters rated 152 essays using a holistic, profile-based scale; in Phase II, they rated 200 essays using a binary, analytic scale developed based on the holistic scale and 100 essays using both rating scales. Ratings were examined both quantitatively through Rasch modeling and qualitatively via think-aloud protocols and semi-structured interviews. Findings from Phase I revealed that, despite satisfactory internal consistency, the raters demonstrated relatively low rater agreement and individual differences in their use of the holistic scale. Findings from Phase II showed that the binary, analytic scale led to much improvement in rater consensus and rater consistency. Another finding from Phase II suggests that the binary, analytic scale helped the raters deconstruct the holistic scale, reducing their cognitive burden. This study represents a creative use of a binary, analytic scale to guide raters through a holistic rating scale. Implications regarding how a rating scale affects rating behavior and performance are discussed.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84172200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Noun phrase complexity in integrated writing produced by advanced Chinese EFL learners 高级中国英语学习者综合写作中名词短语的复杂性

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/lawy6296

Lirong Xu

引用次数: 4

Fairness in language assessment: What can the Rasch model offer? 语言评估的公平性:Rasch模型能提供什么?

Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/jrwg5233

Jason Fan, U. Knoch

{"title":"Fairness in language assessment: What can the Rasch model offer?","authors":"Jason Fan, U. Knoch","doi":"10.58379/jrwg5233","DOIUrl":"https://doi.org/10.58379/jrwg5233","url":null,"abstract":"Drawing upon discussions of fairness in the field of language assessment, this systematic review study explores how the Rasch model has been used to investigate and enhance fairness in language assessment. To that end, we collected and systematically reviewed the empirical studies that used the Rasch model, published in four leading journals in the field from 2000 to 2018. A total of 139 articles were collected and subsequently coded in NVivo 11, using the open coding method. In addition, matrix coding analysis was implemented to explore the relationship between the topics that were identified and the language constructs that constituted the focus of the collected articles. Five broad themes were extracted from the coding process, including: 1) rater effects; 2) language test design and evaluation; 3) differential group performance; 4) evaluation of rating criteria, and 5) standard setting. Representative studies under each category were used to illustrate how the Rasch model was utilised to investigate test fairness. Findings of this study have important implications for language assessment development and evaluation. In addition, the findings also identified a few avenues in the application of the Rasch model which language assessment researchers should explore in future studies.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87619288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Rater variability across examinees and rating criteria in paired speaking assessment 在配对口语评估中，考生之间的评分差异和评分标准

Studies in Language Assessment Pub Date : 2018-01-01 DOI: 10.58379/yvwq3768

S. Youn

{"title":"Rater variability across examinees and rating criteria in paired speaking assessment","authors":"S. Youn","doi":"10.58379/yvwq3768","DOIUrl":"https://doi.org/10.58379/yvwq3768","url":null,"abstract":"This study investigates rater variability with regard to examinees’ levels and rating criteria in paired speaking assessment. 12 raters completed rater training and scored 102 examinees’ paired speaking performances using analytical rating criteria that reflect various features of paired speaking performance. The raters were fairly consistent in their overall ratings, but differed in their severity. The bias analyses using many-facet Rasch measurement revealed that a higher level of rater bias interaction was found for the rating criteria compared to those of the examinees’ levels and the pairing type which reflects a level difference between two examinees. In particular, the most challenging rating category Language Use attracted significant bias interactions. However, the raters did not display more frequent bias interactions based on the interaction-specific rating categories, such as Engaging with Interaction and Turn Organization. Furthermore, the raters tended to reverse their severity patterns across the rating categories. In the rater and examinee bias interactions, the raters tended to show more frequent bias toward the low-level examinees. However, no significant rater bias was found based on the pairing type that consisted of high-level and low-level examinees. These findings have implications for rater training in paired speaking assessment.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"545 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78168361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Evaluating the relative effectiveness of online and face-to-face training for new writing raters 评估新写作评分者在线和面对面培训的相对有效性

Studies in Language Assessment Pub Date : 2018-01-01 DOI: 10.58379/zvmm4117

U. Knoch, J. Fairbairn, C. Myford, A. Huisman

{"title":"Evaluating the relative effectiveness of online and face-to-face training for new writing raters","authors":"U. Knoch, J. Fairbairn, C. Myford, A. Huisman","doi":"10.58379/zvmm4117","DOIUrl":"https://doi.org/10.58379/zvmm4117","url":null,"abstract":"Training writing raters in large-scale tests is commonly conducted face-to-face but bringing raters together for training is difficult and expensive. For this reason, more and more testing agencies are exploring technological advances with the aim of providing training online. A number of studies have examined whether online rater training is a feasible alternative to face-to-face training.This mixed methods study compared two groups of new raters, one trained online using an online training platform and the other trained using the conventional face-to-face rater training procedures. Raters who passed accreditation were also compared in the reliability of their subsequent operational ratings. The findings show that no significant differences between the rating behaviour of the two groups were identified on the writing test. The qualitative data also showed that, in general, the raters enjoyed both modes of training and felt generally sufficiently trained although some specific problems were encountered. Results on the operational ratings in the first five months after completing the training showed no significant differences between the two training groups. The paper concludes with some implications for training raters in online environments and sets out a possible programme for further research.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81157255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4