比较两种以数据为导向的评分量表格式，以评估课堂上的角色扮演语用表现

IF 2.2 1区文学 0 LANGUAGE & LINGUISTICS

Language Testing Pub Date : 2023-11-29 DOI:10.1177/02655322231210217

Yunwen Su, Sun-Young Shin

{"title":"比较两种以数据为导向的评分量表格式，以评估课堂上的角色扮演语用表现","authors":"Yunwen Su, Sun-Young Shin","doi":"10.1177/02655322231210217","DOIUrl":null,"url":null,"abstract":"Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful diagnostic information, and exhibit robust content representativeness and stable measurement properties. This sequential mixed methods study compares two data-driven rating scales with multiple criteria that use different formats for pragmatic performance. They were developed using roleplays performed by 43 second-language learners of Mandarin—the hierarchical-binary (HB) scale, developed through close analysis of performance data, and the multi-trait (MT) scale derived from the HB, which has the same criteria but takes the format of an analytic scale. Results revealed the influence of format, albeit to a limited extent: MT showed a marginal advantage over HB in terms of overall reliability, practicality, and discriminatory power, though measurement properties of the two scales were largely comparable. All raters were positive about the pedagogical value of both scales. This study reveals that rater perceptions of the ease of use and effectiveness of both scales provide further insights into scale functioning.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"52 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with roleplays\",\"authors\":\"Yunwen Su, Sun-Young Shin\",\"doi\":\"10.1177/02655322231210217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful diagnostic information, and exhibit robust content representativeness and stable measurement properties. This sequential mixed methods study compares two data-driven rating scales with multiple criteria that use different formats for pragmatic performance. They were developed using roleplays performed by 43 second-language learners of Mandarin—the hierarchical-binary (HB) scale, developed through close analysis of performance data, and the multi-trait (MT) scale derived from the HB, which has the same criteria but takes the format of an analytic scale. Results revealed the influence of format, albeit to a limited extent: MT showed a marginal advantage over HB in terms of overall reliability, practicality, and discriminatory power, though measurement properties of the two scales were largely comparable. All raters were positive about the pedagogical value of both scales. This study reveals that rater perceptions of the ease of use and effectiveness of both scales provide further insights into scale functioning.\",\"PeriodicalId\":17928,\"journal\":{\"name\":\"Language Testing\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language Testing\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1177/02655322231210217\",\"RegionNum\":1,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Testing","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1177/02655322231210217","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 0

摘要

语言测试人员设计的评分量表应适合特定的测试目的和分数用途，并能反映目标建构。长期以来，研究人员一直在论证数据驱动量表在课堂表现评估中的价值，因为它们针对教学任务和目标，具有丰富的描述符，可提供有用的诊断信息，并表现出强大的内容代表性和稳定的测量属性。这项连续的混合方法研究比较了两种数据驱动的评分量表，它们采用不同的实用性表现形式，具有多重标准。这两个量表是由 43 名普通话第二语言学习者通过角色扮演的方式完成的--通过对表现数据的严密分析而开发的分层二元量表（HB），以及从 HB 量表衍生出的多特征量表（MT），后者具有相同的标准，但采用了分析量表的形式。结果显示了量表形式的影响，尽管影响程度有限：尽管两个量表的测量属性基本相当，但在总体可靠性、实用性和区分度方面，MT 量表比 HB 量表略胜一筹。所有评分者都对两种量表的教学价值持肯定态度。本研究揭示了评定者对两种量表的易用性和有效性的看法，从而进一步揭示了量表的功能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparing two formats of data-driven rating scales for classroom assessment of pragmatic performance with roleplays

Rating scales that language testers design should be tailored to the specific test purpose and score use as well as reflect the target construct. Researchers have long argued for the value of data-driven scales for classroom performance assessment, because they are specific to pedagogical tasks and objectives, have rich descriptors to offer useful diagnostic information, and exhibit robust content representativeness and stable measurement properties. This sequential mixed methods study compares two data-driven rating scales with multiple criteria that use different formats for pragmatic performance. They were developed using roleplays performed by 43 second-language learners of Mandarin—the hierarchical-binary (HB) scale, developed through close analysis of performance data, and the multi-trait (MT) scale derived from the HB, which has the same criteria but takes the format of an analytic scale. Results revealed the influence of format, albeit to a limited extent: MT showed a marginal advantage over HB in terms of overall reliability, practicality, and discriminatory power, though measurement properties of the two scales were largely comparable. All raters were positive about the pedagogical value of both scales. This study reveals that rater perceptions of the ease of use and effectiveness of both scales provide further insights into scale functioning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Language Testing Multiple-

CiteScore

6.70

自引率

9.80%

发文量

期刊介绍： Language Testing is a fully peer reviewed international journal that publishes original research and review articles on language testing and assessment. It provides a forum for the exchange of ideas and information between people working in the fields of first and second language testing and assessment. This includes researchers and practitioners in EFL and ESL testing, and assessment in child language acquisition and language pathology. In addition, special attention is focused on issues of testing theory, experimental investigations, and the following up of practical implications.