{"title":"Exploring the Idea of Task in the Context of the Young Language Learner Classroom","authors":"Veronika Timpe-Laughlin, Bianca Roters, Yuko Goto Butler","doi":"10.1002/ets2.12389","DOIUrl":"https://doi.org/10.1002/ets2.12389","url":null,"abstract":"<p>Originating in adult education, the approach of task-based language teaching (TBLT) has been promoted in young language learner (YLL) education. However, its application often encounters challenges due to varying interpretations of what constitutes a “task.” Previous research has repeatedly highlighted gaps in teachers' understanding of tasks, often reducing them to mere exercises rather than opportunities for genuine communication. A potential issue could be that some of the criteria of a task as defined in the literature that focuses on adult second/foreign language (L2) learners do not necessarily apply or may need to be modified in YLL education. For example, tasks have traditionally been defined as having “authenticity,” but this may vary, as YLLs are often engaged in play and driven by imagination. Additionally, for children, school represents their “real world,” so their concept of an “authentic” task may differ from that of adult L2 learners, who may be attending classes to improve workplace skills. In this study, we aimed to explore the concept of task in the context of teaching an additional language to YLLs in primary education. Utilizing a Delphi method, 16 well-known experts who work at the intersection of applied linguistics, TBLT, and YLLs participated in three rounds of data collection via email. After providing written definitions of a task and its characteristics in the YLL classroom in Round 1, the experts rated each other's definitions on a 4-point Likert scale and provided comments on the definitions in two subsequent rounds. Additionally, we conducted follow-up interviews with a subsample of the participants (<i>n</i> = 6) relative to a particular task characteristic: “authenticity.” Using both quantitative and qualitative analyses, we identified key aspects from the data, including task characteristics, learner considerations, and implementation details. Findings showed a distinction between “activity” and “task,” with the latter being understood as featuring certain characteristics. Accordingly, a task in the YLL classroom has a goal orientation, an orientation to meaning rather than linguistic form, a need for YLLs to use their L2 repertoire, a type of information gap, and a real-life connection. While largely congruent with the concept of task in the L2 adult literature, the experts particularly highlighted a learner-oriented approach to tasks that stresses cognitive, social-emotional, and affective development of YLLs. In particular, experts highlighted the significance of imagination as part of children's authentic world. Thus an “authentic” task for adults may reference a “real-world” domain, whereas an authentic task for YLLs may reference an imaginary one. We discuss the findings and emphasize that the concept of task in YLL education should be broadened to include aspects of imaginary worlds and make-believe.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-52"},"PeriodicalIF":0.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12389","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142867864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zuowei Wang, Beata Beigman Klebanov, Tenaha O'Reilly, John Sabatini
{"title":"Monitoring Oral Reading Fluency From Electronic Shared Book Reading: Insights From a Full-Length Book Reading Study With Relay Reader®","authors":"Zuowei Wang, Beata Beigman Klebanov, Tenaha O'Reilly, John Sabatini","doi":"10.1002/ets2.12390","DOIUrl":"https://doi.org/10.1002/ets2.12390","url":null,"abstract":"<p>Existing research reveals a robust relationship between self-reported print exposure and long-term literacy development, yet few studies have demonstrated how reading skills change as children read a book in the short term. In this study, 50 children (mean age 9.7 years, <i>SD</i> = .8) took turns with a prerecorded narrator reading aloud a popular children's novel, producing 6,092 oral reading responses over 1,093 book passages. Each oral reading response was evaluated by a speech engine that calculated words-correct-per-minute (WCPM). Mixed effect models revealed that text level differences, between-individual differences, and within-individual variations explained 13%, 56% and 32% of variance in WCPM, respectively. On average, children started reading the book at about 93 WCPM, and they improved by 2.26 WCPM for every 10,000 words of book reading. Random effects showed that the standard deviation of the growth rate was 1.85 WCPM, suggesting substantial individual difference in growth rate. Implications for reading instruction and assessment were discussed.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-11"},"PeriodicalIF":0.0,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12390","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patrick Kyllonen, Amit Sevak, Teresa Ober, Ikkyu Choi, Jesse Sparks, Daniel Fishtein
{"title":"Charting the Future of Assessments","authors":"Patrick Kyllonen, Amit Sevak, Teresa Ober, Ikkyu Choi, Jesse Sparks, Daniel Fishtein","doi":"10.1002/ets2.12388","DOIUrl":"https://doi.org/10.1002/ets2.12388","url":null,"abstract":"<p>Assessment refers to a broad array of approaches for measuring or evaluating a person's (or group of persons') skills, behaviors, dispositions, or other attributes. Assessments range from standardized tests used in admissions, employee selection, licensure examinations, and domestic and international large-scale assessments of cognitive and behavioral skills to formative K–12 classroom curricular assessments. The various types of assessments are used for a wide variety of purposes, but they also have many common elements, such as standards for their reliability, validity, and fairness—even classroom assessments have standards.</p><p>\u0000 \u0000 </p><p>In this paper, we argue and provide evidence for our belief that the future of assessment contains challenges but is promising. The challenges include risks associated with security and exposure of personal data, test score bias, and inappropriate test uses, all of which may be exacerbated by the growing infiltration of artificial intelligence (AI) into our lives. The promise is increasing opportunities for testing to help individuals achieve their education and career goals and contribute to well-being and overall quality of life. To help achieve this promise we focus on the evidence-based science of measurement in education and workplace learning, a theme throughout this paper.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-62"},"PeriodicalIF":0.0,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142868801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Song, Ralph P. Ferretti, John Sabatini, Wenju Cui
{"title":"Insights Into Critical Discussion: Designing a Computer-Supported Collaborative Space for Middle Schoolers","authors":"Yi Song, Ralph P. Ferretti, John Sabatini, Wenju Cui","doi":"10.1002/ets2.12387","DOIUrl":"https://doi.org/10.1002/ets2.12387","url":null,"abstract":"<p>Collaborative learning environments that support students' problem solving have been shown to promote better decision-making, greater academic achievement, and more reasonable argumentation about controversial issues. In this research, we developed a technology-based critical discussion platform to support middle school students' argumentation, with a focus on evidence-based reasoning and perspective taking. A feasibility study was conducted to examine the patterns of group interaction and individual students' contributions to the critical discussion and their perceptions of the critical discussion activity. We found that more students used text-based communications than audio, but students who used audio collaborated with each other more frequently. In addition, student engagement in argumentative discourse varied greatly across groups as well as individuals. At the end of the discussion, most groups provided a solution that integrated both sides of the controversial issue. Survey and interview results suggest an overall positive experience with this technology-supported critical discussion activity. Using the insights from our research, we develop a conceptual dialogue analysis framework that identifies relevant skills under the argumentation and collaboration dimensions. In this report, we discuss our design considerations, feasibility study results, and implications of engaging students in computer-supported collaborative argumentation.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12387","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting the Impact of Remote Proctored At-Home Testing Using Propensity Score Weighting","authors":"Jing Miao, Yi Cao, Michael E. Walker","doi":"10.1002/ets2.12386","DOIUrl":"https://doi.org/10.1002/ets2.12386","url":null,"abstract":"<p>Studies of test score comparability have been conducted at different stages in the history of testing to ensure that test results carry the same meaning regardless of test conditions. The expansion of at-home testing via remote proctoring sparked another round of interest. This study uses data from three licensure tests to assess potential mode effects associated with the dual option of on-site testing at test centers and at-home testing via remote proctoring. We generated propensity score weights to balance the two self-selected groups in order to detect the mode effect on the test outcomes. We also assessed the potential impact of omitted variables on the estimated mode effect. Results of the study indicate that the demographic compositions of the test takers are similar before and after the introduction of the RP option. Examinees under the two testing modes differ slightly on certain background variables. Once the group differences are adjusted by propensity score weighting, the estimated mode effects are small and nonsystematic across test titles overall. We note some variations across subgroups based on gender and race.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-32"},"PeriodicalIF":0.0,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Steinberg, Carol Forsyth, Jessica Andrews-Todd
{"title":"An Exploratory Approach to Predicting Performance in an Online Electronics Collaborative Problem-Solving Task","authors":"Jonathan Steinberg, Carol Forsyth, Jessica Andrews-Todd","doi":"10.1002/ets2.12385","DOIUrl":"https://doi.org/10.1002/ets2.12385","url":null,"abstract":"<p>In a study of 370 postsecondary students in electronics, engineering, and other science classes, we investigated collaborative problem-solving (CPS) skills that best predict performance at individual levels in an online electronics environment. The results showed that while monitoring was a consistent predictor across levels, other skills such as executing, sharing information, planning, and maintaining communication each predicted individual performance at one or more levels of the task. The availability of background data on students' content classes and associated content knowledge to analyze the model results can help identify possible cues for instructors across domains to help students improve specific CPS skills to achieve high performance in activities conducted in collaborative learning environments.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-12"},"PeriodicalIF":0.0,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12385","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Test-Taking Engagement in Changing Test Contexts","authors":"Blair Lehman, Jesse R. Sparks, Jonathan Steinberg","doi":"10.1002/ets2.12384","DOIUrl":"https://doi.org/10.1002/ets2.12384","url":null,"abstract":"<p>Over the last 20 years, many methods have been proposed to use process data (e.g., response time) to detect changes in engagement during the test-taking process. However, many of these methods were developed and evaluated in highly similar testing contexts: 30 or more single-select multiple-choice items presented in a linear, fixed sequence in which an item must be answered before progressing to the next item. However, this testing context becomes less and less representative of testing contexts in general as the affordances of technology are leveraged to provide more diverse and innovative testing experiences. The 2019 National Assessment of Educational Progress (NAEP) mathematics administration for grades 8 and 12 testing context represents an example use case that differed significantly from assessments that were typically used in previous research on test-taking engagement (e.g., number of items, item format, navigation). Thus, we leveraged this use case to re-evaluate the utility of an existing engagement detection method: normative threshold method. We decomposed the normative threshold method to evaluate its alignment with this use case and then evaluated 25 variations of this threshold-setting method with previously established evaluation criteria. Our findings revealed that this critical analysis of the threshold-setting method's alignment with the NAEP testing context could be used to identify the most appropriate variation of this method for this use case. We discuss the broader implications for engagement detection as testing contexts continue to evolve.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-15"},"PeriodicalIF":0.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142868039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ikkyu Choi, Jiangang Hao, Chen Li, Michael Fauss, Jakub Novák
{"title":"AutoESD: An Automated System for Detecting Nonauthentic Texts for High-Stakes Writing Tests","authors":"Ikkyu Choi, Jiangang Hao, Chen Li, Michael Fauss, Jakub Novák","doi":"10.1002/ets2.12383","DOIUrl":"https://doi.org/10.1002/ets2.12383","url":null,"abstract":"<p>A frequently encountered security issue in writing tests is nonauthentic text submission: Test takers submit texts that are not their own but rather are copies of texts prepared by someone else. In this report, we propose AutoESD, a human-in-the-loop and automated system to detect nonauthentic texts for a large-scale writing tests, and report its performance on an operational data set. The AutoESD system utilizes multiple automated text similarity measures to identify suspect texts and provides an analytics-enhanced web application to help human experts review the identified texts. To evaluate the performance of AutoESD, we obtained its similarity measures on <i>TOEFL iBT</i>® test writing responses collected from multiple remote administrations and examined their distributions. The results were highly encouraging in that the distributional characteristics of AutoESD similarity measures were effective in identifying suspect texts and the measures could be computed quickly without affecting the operational score turnaround timeline.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-16"},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12383","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142868958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Reliability for Tests With One Constructed-Response Item in a Section","authors":"Yanxuan Qu, Sandip Sinharay","doi":"10.1002/ets2.12382","DOIUrl":"https://doi.org/10.1002/ets2.12382","url":null,"abstract":"<p>The goal of this paper is to find better ways to estimate the internal consistency reliability of scores on tests with a specific type of design that are often encountered in practice: tests with constructed-response items clustered into sections that are not parallel or tau-equivalent, and one of the sections has only one item. To estimate the reliability of scores on this kind of test, we propose a two-step approach (denoted as CA_STR) that first estimates the reliability of scores on the section with a single item using the correction for attenuation method and then estimates the reliability of scores on the whole test using the stratified coefficient alpha. We compared the CA_STR method with three other reliability estimation approaches under various conditions using both real and simulated data. We found that overall, the CA_STR method performed the best and it was easy to implement.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-16"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12382","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142868951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonathan Schmidgall, Yan Huo, Jaime Cid, Youhua Wei
{"title":"Investigating Fairness Claims for a General-Purposes Assessment of English Proficiency for the International Workplace: Do Full-Time Employees Have an Unfair Advantage Over Full-Time Students?","authors":"Jonathan Schmidgall, Yan Huo, Jaime Cid, Youhua Wei","doi":"10.1002/ets2.12380","DOIUrl":"https://doi.org/10.1002/ets2.12380","url":null,"abstract":"<p>The principle of fairness in testing traditionally involves an assertion about the absence of bias, or that measurement should be impartial (i.e., not provide an unfair advantage or disadvantage), across groups of test takers. In more general-purposes language testing, a test taker's background knowledge is not typically considered relevant to the measurement of language proficiency; consequently, if there are systematic differences in background knowledge between groups of test takers this background knowledge should not provide an unfair advantage or disadvantage. As a general-purposes assessment of English for everyday life and the international workplace, the TOEIC® Listening and Reading test is designed to assess the listening and reading comprehension skills of second language (L2) users of English. In this study, we investigated whether a group of test takers with more workplace experience (full-time employees) have an unfair advantage over test takers with less workplace experience (full-time students). We conducted DIF analysis using nine forms of the test (1,800 items) and flagged 18 items (1.0%) for statistical differential functioning. An expert panel reviewed the items and concluded that none of the items could be clearly identified as biased in favor of employed (or student) test takers. Follow-up analyses using score equity assessment found that test scores do not unfairly advantage fulltime employed (versus student) test takers. Finally, we performed a content review using two expert panels that led to examples of how workplace-oriented content is incorporated into test items without disadvantaging full-time students (versus full-time employees). The results of these analyses provide support for claims about the impartiality (or fairness) of TOEIC Listening and Reading test scores for postsecondary test takers and add to current research on the role of background knowledge and fairness for more general-purposes language assessments.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-20"},"PeriodicalIF":0.0,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12380","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142869234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}