开发一种创新的引出式模仿任务用于高效英语水平评估

Q3 Social Sciences

ETS Research Report Series Pub Date : 2021-11-17 DOI:10.1002/ets2.12338

Larry Davis, John Norris

{"title":"开发一种创新的引出式模仿任务用于高效英语水平评估","authors":"Larry Davis, John Norris","doi":"10.1002/ets2.12338","DOIUrl":null,"url":null,"abstract":"The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The TOEFL® Essentials™ test includes an EIT as a holistic measure of speaking proficiency, referred to as the “Listen and Repeat” task type. In this report, we describe the design considerations that informed the development of the EIT for TOEFL Essentials. We also report the results of a series of investigations conducted during the prototyping and pilot phases of test development, which were undertaken with the goal of confirming task design specifications, evaluating scoring performance, and obtaining initial validity evidence to support score interpretation and use of the EIT in the TOEFL Essentials test. We found that task design variables generally performed as expected. The length of input sentence was strongly associated with performance (Pearson r = .88), consistent with the construct measured by the EIT, while other task variables not directly related to the EIT construct did not impact performance (e.g., graphics, speaker accent, and response time). Scorers drawn from TOEFL iBT test raters were able to score responses consistently with over 98% exact or adjacent interrater agreement on a 6-point scale, and scores on the pilot version of the EIT were highly reliable (Cronbach's α = .93 on the 15-item pilot version). Correlations between EIT scores and other measures were generally as expected: Correlations with other speaking tasks were high (.78–.84) and slightly to somewhat lower for other language measures (.73 for writing, .68 for listening, and .57 for reading). Correlation with an independent measure of holistic language proficiency (C-test) was moderately high (.69), as expected. We discuss the study findings in terms of the TOEFL Essentials test validity argument and point out limitations to the current results along with future research needs. Overall, we believe that the findings provide initial support to warrant the use of the EIT as operationalized in the TOEFL Essentials test.","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2021 1","pages":"1-30"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12338","citationCount":"7","resultStr":"{\"title\":\"Developing an Innovative Elicited Imitation Task for Efficient English Proficiency Assessment\",\"authors\":\"Larry Davis, John Norris\",\"doi\":\"10.1002/ets2.12338\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The TOEFL® Essentials™ test includes an EIT as a holistic measure of speaking proficiency, referred to as the “Listen and Repeat” task type. In this report, we describe the design considerations that informed the development of the EIT for TOEFL Essentials. We also report the results of a series of investigations conducted during the prototyping and pilot phases of test development, which were undertaken with the goal of confirming task design specifications, evaluating scoring performance, and obtaining initial validity evidence to support score interpretation and use of the EIT in the TOEFL Essentials test. We found that task design variables generally performed as expected. The length of input sentence was strongly associated with performance (Pearson r = .88), consistent with the construct measured by the EIT, while other task variables not directly related to the EIT construct did not impact performance (e.g., graphics, speaker accent, and response time). Scorers drawn from TOEFL iBT test raters were able to score responses consistently with over 98% exact or adjacent interrater agreement on a 6-point scale, and scores on the pilot version of the EIT were highly reliable (Cronbach's α = .93 on the 15-item pilot version). Correlations between EIT scores and other measures were generally as expected: Correlations with other speaking tasks were high (.78–.84) and slightly to somewhat lower for other language measures (.73 for writing, .68 for listening, and .57 for reading). Correlation with an independent measure of holistic language proficiency (C-test) was moderately high (.69), as expected. We discuss the study findings in terms of the TOEFL Essentials test validity argument and point out limitations to the current results along with future research needs. Overall, we believe that the findings provide initial support to warrant the use of the EIT as operationalized in the TOEFL Essentials test.\",\"PeriodicalId\":11972,\"journal\":{\"name\":\"ETS Research Report Series\",\"volume\":\"2021 1\",\"pages\":\"1-30\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12338\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ETS Research Report Series\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ets2.12338\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETS Research Report Series","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ets2.12338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 7

摘要

诱导模仿任务(EIT)是语言学习者听一系列口语句子并逐字重复的任务，是第二语言习得研究中常用的语言熟练程度测量方法。托福®Essentials™考试包括一个EIT测试，作为口语能力的整体衡量标准，被称为“听和重复”任务类型。在本报告中，我们描述了为TOEFL Essentials开发EIT的设计考虑因素。我们还报告了在测试开发的原型和试点阶段进行的一系列调查的结果，这些调查的目的是确认任务设计规范，评估评分表现，并获得初步有效性证据，以支持在托福基本考试中解释分数和使用EIT。我们发现任务设计变量的表现与预期一致。输入句子的长度与表现密切相关(Pearson r = 0.88)，这与EIT测量的结构一致，而其他与EIT结构不直接相关的任务变量(例如，图形，说话者口音和反应时间)不影响表现。从托福网考评分员中抽取的评分者能够在6分制的评分中保持98%以上的准确或接近的一致性，并且EIT试点版本的分数是高度可靠的(在15项试点版本中Cronbach's α = 0.93)。EIT得分与其他指标之间的相关性总体上与预期一致:与其他口语任务的相关性较高(0.78 - 0.84)，而与其他语言指标的相关性略低(0.78 - 0.84)。写作73分，听力68分，阅读57分)。与整体语言能力的独立测量(C-test)的相关性中等高(0.69)，正如预期的那样。我们根据托福基本测试的有效性论点讨论了研究结果，并指出了当前结果的局限性以及未来的研究需求。总的来说，我们认为这些发现为在托福基础考试中使用EIT提供了初步支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Developing an Innovative Elicited Imitation Task for Efficient English Proficiency Assessment

查看原文本刊更多论文

Developing an Innovative Elicited Imitation Task for Efficient English Proficiency Assessment

The elicited imitation task (EIT), in which language learners listen to a series of spoken sentences and repeat each one verbatim, is a commonly used measure of language proficiency in second language acquisition research. The TOEFL® Essentials™ test includes an EIT as a holistic measure of speaking proficiency, referred to as the “Listen and Repeat” task type. In this report, we describe the design considerations that informed the development of the EIT for TOEFL Essentials. We also report the results of a series of investigations conducted during the prototyping and pilot phases of test development, which were undertaken with the goal of confirming task design specifications, evaluating scoring performance, and obtaining initial validity evidence to support score interpretation and use of the EIT in the TOEFL Essentials test. We found that task design variables generally performed as expected. The length of input sentence was strongly associated with performance (Pearson r = .88), consistent with the construct measured by the EIT, while other task variables not directly related to the EIT construct did not impact performance (e.g., graphics, speaker accent, and response time). Scorers drawn from TOEFL iBT test raters were able to score responses consistently with over 98% exact or adjacent interrater agreement on a 6-point scale, and scores on the pilot version of the EIT were highly reliable (Cronbach's α = .93 on the 15-item pilot version). Correlations between EIT scores and other measures were generally as expected: Correlations with other speaking tasks were high (.78–.84) and slightly to somewhat lower for other language measures (.73 for writing, .68 for listening, and .57 for reading). Correlation with an independent measure of holistic language proficiency (C-test) was moderately high (.69), as expected. We discuss the study findings in terms of the TOEFL Essentials test validity argument and point out limitations to the current results along with future research needs. Overall, we believe that the findings provide initial support to warrant the use of the EIT as operationalized in the TOEFL Essentials test.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ETS Research Report Series Social Sciences-Education

CiteScore

1.20

自引率

0.00%

发文量