Studies in Language Assessment最新文献

筛选
英文 中文
Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet 评量ETIC进阶写作任务的评量判断:概括性理论与多面向的应用
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/vmak1620
Jiayu Wang, Kaizhou Luo
{"title":"Evaluating rater judgments on ETIC Advanced writing tasks: An application of generalizability theory and Many-Facet","authors":"Jiayu Wang, Kaizhou Luo","doi":"10.58379/vmak1620","DOIUrl":"https://doi.org/10.58379/vmak1620","url":null,"abstract":"Developed by China Language Assessment (CLA), the English Test for International Communication Advanced (ETIC Advanced) assesses one’s ability to perform English language tasks in international workplace contexts. ETIC Advanced is only composed of writing and speaking tasks, featured with authentic constructed response format. However, the elicitation of extended responses from candidates would call for human raters to make judgments, thus raising a critical issue of rating quality. This study aimed to evaluate rater judgements on the writing tasks of ETIC Advanced. Data in the study represented scores from 186 candidates who performed all writing tasks: Letter Writing, Report Writing, and Proposal Writing (n=3,348 ratings). Rating was conducted by six certified raters based on a six-point three-category analytical rating scale. Generalizability theory (GT) and Many-Facets Rasch Model (MFRM) were applied to analyse the scores from different perspectives. Results from GT indicated that raters’ inconsistency and interaction with other aspects resulted in a relatively low proportion of overall score variance, and that the ratings sufficed for generalization. MFRM analysis revealed that the six raters differed significantly in severity, yet remained consistent in their own judgements. Bias analyses indicated that the raters tended to assign more biased scores to low-proficient candidates and the Content category of rating scale. The study serves to demonstrate the use of both GT and MFRM to evaluate rater judgments on language performance tests. The findings of this study have implications for ETIC rater training.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74964941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
T. McNamara, U. Knoch & J. Fan. Fairness, Justice, and Language Assessment T.麦克纳马拉,U. Knoch和J. Fan。公平、正义和语言评估
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/nrax8588
Troy L. Cox
{"title":"T. McNamara, U. Knoch & J. Fan. Fairness, Justice, and Language Assessment","authors":"Troy L. Cox","doi":"10.58379/nrax8588","DOIUrl":"https://doi.org/10.58379/nrax8588","url":null,"abstract":"<jats:p>n/a</jats:p>","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73175295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examination of CEFR-J spoken interaction tasks using many-facet Rasch measurement and generalizability theory 用多面Rasch测量和推广理论检验CEFR-J口语交互任务
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/bswy7332
Rie Koizumi, Emiko Kaneko, E. Setoguchi, Yo In’nami
{"title":"Examination of CEFR-J spoken interaction tasks using many-facet Rasch measurement and generalizability theory","authors":"Rie Koizumi, Emiko Kaneko, E. Setoguchi, Yo In’nami","doi":"10.58379/bswy7332","DOIUrl":"https://doi.org/10.58379/bswy7332","url":null,"abstract":"Attempts are underway to develop prototype tasks, based on a Japanese version of the Common European Framework of Reference for Languages (CEFR; Council of Europe, 2001; CEFR-J; Negishi, Takada, & Tono, 2013). As part of this larger project, the current paper reports on the creation of spoken interaction tasks for five levels (Pre-A1, A1.1, A1.2, A1.3, and A2.1). Tasks were undertaken by 66 Japanese university students. Two raters evaluated their interactions using a three-level holistic rating scale, and 20% of the performances were double rated. The spoken ratings were analysed using many-facet Rasch measurement (MFRM) and generalizability theory (G-theory). MFRM showed that all the tasks fit the Rasch model well, the scale functioned satisfactorily, and the difficulty of the tasks generally concurred with CEFR-J levels. Results from G-theory that employed the p x t design, including tasks as a facet, showed the different proportion of variance accounted for by tasks, as well as the number of tasks that could be required to ensure sufficiently high reliability. The MFRM and G-theory results effectively revealed areas for improving spoken interaction tasks; the results also showed the usefulness of combining the two methods for task development and revision.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74110307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An investigation of factors involved in Japanese students’ English learning behavior during test preparation 日本学生备考期间英语学习行为的相关因素调查
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/fsbq6351
Takanori Sato
{"title":"An investigation of factors involved in Japanese students’ English learning behavior during test preparation","authors":"Takanori Sato","doi":"10.58379/fsbq6351","DOIUrl":"https://doi.org/10.58379/fsbq6351","url":null,"abstract":"Japan has recently been promoting university entrance examination reform with the goal of positively influencing students’ English learning, but the extent to which entrance examinations themselves affect English learning is not known. The promotion of better learning requires changing the factors that affect learning behavior, rather than merely modifying existing examinations or introducing new ones. This study investigated the factors determining Japanese students’ English learning while they prepared for high-stakes university entrance examinations, aiming to construct a model that explicates how test-related and test-independent factors are intertwined. Semi-structured interviews were conducted with 14 first-year university students asking how they had prepared for their examinations and why they had chosen particular preparation methods. After thematic analysis, four main factors in student learning behavior were identified (examination, student views, school, and examination-independent factors) and their relationships explored. The study findings provide useful insights for policymakers in English as a foreign language (EFL) educational contexts, where English tests are used as part of language education policies. Furthermore, the proposed model is theoretically important as it explains the complex washback mechanism and deepens our understanding of why intended washback effects on learning are not necessarily achieved.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80086381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Benchmarking video presentations for CEFR usage in Cuba 古巴CEFR使用的基准视频演示
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/tvkg6591
Geisa Dávila Pérez, Frank van Splunder, L. Baten, Jan Van Maele, Yoennis Díaz Moreno
{"title":"Benchmarking video presentations for CEFR usage in Cuba","authors":"Geisa Dávila Pérez, Frank van Splunder, L. Baten, Jan Van Maele, Yoennis Díaz Moreno","doi":"10.58379/tvkg6591","DOIUrl":"https://doi.org/10.58379/tvkg6591","url":null,"abstract":"This paper discusses language assessment by means of video recordings, particularly its use for benchmarking purposes regarding language proficiency in a Cuban academic context. It is based on videotaped oral presentation assignments of Cuban PhD students for peer and teacher assessment. In order to avoid bias and provide validity to the results, the PhD students’ videotaped oral presentation assignments have been rated by language testing experts from three different Flemish universities, which are included in the Interuniversity Testing Consortium (IUTC). A selection of these assignments will be transferred to the university Moodle platform, and this compilation may be used to enable the start of a Cuban corpus of internationally rated presentations of academic English. Therefore, the results obtained will provide language teachers with a growing database of video recordings to facilitate benchmarking activities and promote standardized assessment in the Cuban academic context.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"340 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74837664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An investigation into rater performance with a holistic scale and a binary, analytic scale on an ESL writing placement test 用整体量表和二元分析量表对ESL写作分班测试中评分者表现的调查
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/nkdc1529
Hyunji Hayley Park, Xun Yan
{"title":"An investigation into rater performance with a holistic scale and a binary, analytic scale on an ESL writing placement test","authors":"Hyunji Hayley Park, Xun Yan","doi":"10.58379/nkdc1529","DOIUrl":"https://doi.org/10.58379/nkdc1529","url":null,"abstract":"This two-phased, sequential mixed-methods study investigates how raters are influenced by different rating scales on a college-level English as a second language (ESL) writing placement test. In Phase I, nine certified raters rated 152 essays using a holistic, profile-based scale; in Phase II, they rated 200 essays using a binary, analytic scale developed based on the holistic scale and 100 essays using both rating scales. Ratings were examined both quantitatively through Rasch modeling and qualitatively via think-aloud protocols and semi-structured interviews. Findings from Phase I revealed that, despite satisfactory internal consistency, the raters demonstrated relatively low rater agreement and individual differences in their use of the holistic scale. Findings from Phase II showed that the binary, analytic scale led to much improvement in rater consensus and rater consistency. Another finding from Phase II suggests that the binary, analytic scale helped the raters deconstruct the holistic scale, reducing their cognitive burden. This study represents a creative use of a binary, analytic scale to guide raters through a holistic rating scale. Implications regarding how a rating scale affects rating behavior and performance are discussed.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84172200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Noun phrase complexity in integrated writing produced by advanced Chinese EFL learners 高级中国英语学习者综合写作中名词短语的复杂性
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/lawy6296
Lirong Xu
{"title":"Noun phrase complexity in integrated writing produced by advanced Chinese EFL learners","authors":"Lirong Xu","doi":"10.58379/lawy6296","DOIUrl":"https://doi.org/10.58379/lawy6296","url":null,"abstract":"This study aims to investigate the relationship between the noun phrase complexity of advanced Chinese EFL learners’ integrated writing and the score assigned by expert raters. Their written performance was also compared with those of native English speakers (NS) at university level with particular reference to the use of noun phrases. One hundred and twenty integrated writing samples were collected from an English writing test administered in a southeastern province of China. Results showed that there was a moderately positive correlation between the use of complex nominals in test-takers’ writing and the corresponding score. More specifically, non-native speakers of English (NNS) and NS groups differed significantly in the majority of noun phrase complexity measures. The implications are discussed concerning noun phrase complexity as a more reliable measure of syntactic complexity for an integrated writing test.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81840823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fairness in language assessment: What can the Rasch model offer? 语言评估的公平性:Rasch模型能提供什么?
Studies in Language Assessment Pub Date : 2019-01-01 DOI: 10.58379/jrwg5233
Jason Fan, U. Knoch
{"title":"Fairness in language assessment: What can the Rasch model offer?","authors":"Jason Fan, U. Knoch","doi":"10.58379/jrwg5233","DOIUrl":"https://doi.org/10.58379/jrwg5233","url":null,"abstract":"Drawing upon discussions of fairness in the field of language assessment, this systematic review study explores how the Rasch model has been used to investigate and enhance fairness in language assessment. To that end, we collected and systematically reviewed the empirical studies that used the Rasch model, published in four leading journals in the field from 2000 to 2018. A total of 139 articles were collected and subsequently coded in NVivo 11, using the open coding method. In addition, matrix coding analysis was implemented to explore the relationship between the topics that were identified and the language constructs that constituted the focus of the collected articles. Five broad themes were extracted from the coding process, including: 1) rater effects; 2) language test design and evaluation; 3) differential group performance; 4) evaluation of rating criteria, and 5) standard setting. Representative studies under each category were used to illustrate how the Rasch model was utilised to investigate test fairness. Findings of this study have important implications for language assessment development and evaluation. In addition, the findings also identified a few avenues in the application of the Rasch model which language assessment researchers should explore in future studies.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87619288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Rater variability across examinees and rating criteria in paired speaking assessment 在配对口语评估中,考生之间的评分差异和评分标准
Studies in Language Assessment Pub Date : 2018-01-01 DOI: 10.58379/yvwq3768
S. Youn
{"title":"Rater variability across examinees and rating criteria in paired speaking assessment","authors":"S. Youn","doi":"10.58379/yvwq3768","DOIUrl":"https://doi.org/10.58379/yvwq3768","url":null,"abstract":"This study investigates rater variability with regard to examinees’ levels and rating criteria in paired speaking assessment. 12 raters completed rater training and scored 102 examinees’ paired speaking performances using analytical rating criteria that reflect various features of paired speaking performance. The raters were fairly consistent in their overall ratings, but differed in their severity. The bias analyses using many-facet Rasch measurement revealed that a higher level of rater bias interaction was found for the rating criteria compared to those of the examinees’ levels and the pairing type which reflects a level difference between two examinees. In particular, the most challenging rating category Language Use attracted significant bias interactions. However, the raters did not display more frequent bias interactions based on the interaction-specific rating categories, such as Engaging with Interaction and Turn Organization. Furthermore, the raters tended to reverse their severity patterns across the rating categories. In the rater and examinee bias interactions, the raters tended to show more frequent bias toward the low-level examinees. However, no significant rater bias was found based on the pairing type that consisted of high-level and low-level examinees. These findings have implications for rater training in paired speaking assessment.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"545 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78168361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Evaluating the relative effectiveness of online and face-to-face training for new writing raters 评估新写作评分者在线和面对面培训的相对有效性
Studies in Language Assessment Pub Date : 2018-01-01 DOI: 10.58379/zvmm4117
U. Knoch, J. Fairbairn, C. Myford, A. Huisman
{"title":"Evaluating the relative effectiveness of online and face-to-face training for new writing raters","authors":"U. Knoch, J. Fairbairn, C. Myford, A. Huisman","doi":"10.58379/zvmm4117","DOIUrl":"https://doi.org/10.58379/zvmm4117","url":null,"abstract":"Training writing raters in large-scale tests is commonly conducted face-to-face but bringing raters together for training is difficult and expensive. For this reason, more and more testing agencies are exploring technological advances with the aim of providing training online. A number of studies have examined whether online rater training is a feasible alternative to face-to-face training.This mixed methods study compared two groups of new raters, one trained online using an online training platform and the other trained using the conventional face-to-face rater training procedures. Raters who passed accreditation were also compared in the reliability of their subsequent operational ratings. The findings show that no significant differences between the rating behaviour of the two groups were identified on the writing test. The qualitative data also showed that, in general, the raters enjoyed both modes of training and felt generally sufficiently trained although some specific problems were encountered. Results on the operational ratings in the first five months after completing the training showed no significant differences between the two training groups. The paper concludes with some implications for training raters in online environments and sets out a possible programme for further research.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81157255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信