{"title":"Test‐Based Accountability Systems: The Importance of Paying Attention to Consequences","authors":"S. Lane","doi":"10.1002/ets2.12283","DOIUrl":"https://doi.org/10.1002/ets2.12283","url":null,"abstract":"","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41646002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Existing Data to Inform Development of New Item Types","authors":"Hongwen Guo, Guangming Ling, Lois Frankel","doi":"10.1002/ets2.12284","DOIUrl":"https://doi.org/10.1002/ets2.12284","url":null,"abstract":"","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12284","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44362369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ETS Research Report Series","authors":"","doi":"10.1002/ets2.12282","DOIUrl":"https://doi.org/10.1002/ets2.12282","url":null,"abstract":"<p><b>EIGNOR EXECUTIVE EDITOR</b></p><p>James Carlson</p><p><i>Principal Psychometrician</i></p><p><b>ASSOCIATE EDITORS</b></p><p>Beata Beigman Klebanov</p><p><i>Senior Research Scientist</i></p><p>Heather Buzick</p><p><i>Research Scientist</i></p><p>Brent Bridgeman</p><p><i>Distinguished Presidential Appointee</i></p><p>Keelan Evanini</p><p><i>Research Director</i></p><p>Marna Golub-Smith</p><p><i>Principal Psychometrician</i></p><p>Shelby Haberman</p><p><i>Distinguished Presidential Appointee</i></p><p>Anastassia Loukina</p><p><i>Research Scientist</i></p><p>John Mazzeo</p><p><i>Distinguished Presidential Appointee</i></p><p>Donald Powers</p><p><i>Principal Research Scientist</i></p><p>Gautam Puhan</p><p><i>Principal Psychometrician</i></p><p>John Sabatini</p><p><i>Managing Principal Research Scientist</i></p><p>Elizabeth Stone</p><p><i>Research Scientist</i></p><p>Rebecca Zwick</p><p><i>Distinguished Presidential Appointee</i></p><p><b>PRODUCTION EDITORS</b></p><p>Kim Fryer</p><p><i>Manager, Editing Services</i></p><p>Ayleen Gontz</p><p><i>Senior Editor</i></p><p>Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research freely available to the professional community and to the general public. Published accounts of ETS research, including papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that outside organizations may provide as part of their own publication processes. Peer review notwithstanding, the positions expressed in the ETS Research Report series and other published accounts of ETS research are those of the authors and not necessarily those of theOfficers and Trustees of Educational Testing Service.</p><p>TheDaniel Eignor Editorship is named in honor of Dr.DanielR. Eignor,who from2001 until 2011 served theResearch and Development division as Editor for the ETS Research Report series.The Eignor Editorship has been created to recognize the pivotal leadership role that Dr. Eignor played in the research publication process at ETS.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12282","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"109171782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Longitudinal Development of Grammatical Complexity at the Phrasal and Clausal Levels in Spoken and Written Responses to the TOEFL iBT® Test","authors":"Bethany Gray, Joe Geluso, Phuong Nguyen","doi":"10.1002/ets2.12280","DOIUrl":"10.1002/ets2.12280","url":null,"abstract":"<p>In the present study, we take a longitudinal, corpus-based perspective to investigate short-term (over 9 months) linguistic change in the language produced for the spoken and written sections of the <i>TOEFL iBT</i>® test by a group of English-as-a-foreign-language (EFL) learners in China. The goal of the study is to identify patterns that characterize the trajectory that language learners move through in terms of their use of phrasal and clausal grammatical complexity, as mediated by mode (spoken and written) and task type (independent and integrated). Results of a multidimensional analysis reveal that in many cases, learners developed in expected ways: discourse styles at Time 1 were not always aligned with mode- and task type-specific discourse patterns but developed over time, with discourse styles at Time 2 better approximating expected norms and exhibiting increased task differentiation. These changes were particularly noteworthy for Dimension 1, which is related to phrasal and clausal complexity. Results of a developmental complexity analysis revealed more mixed results, seeming to indicate that the relatively low-proficiency learners represented by the longitudinal corpus may just be beginning on the hypothesized paths of development. The most important developments occurred for independent writing, in which students exhibited increases in the frequency of phrasal features, as well as functional expansion in the use of a range of complexity features.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":"1-51"},"PeriodicalIF":0.0,"publicationDate":"2019-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12280","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43983282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spiros Papageorgiou, Sha Wu, Ching-Ni Hsieh, Richard J. Tannenbaum, Mengmeng Cheng
{"title":"Mapping the TOEFL iBT® Test Scores to China's Standards of English Language Ability: Implications for Score Interpretation and Use","authors":"Spiros Papageorgiou, Sha Wu, Ching-Ni Hsieh, Richard J. Tannenbaum, Mengmeng Cheng","doi":"10.1002/ets2.12281","DOIUrl":"10.1002/ets2.12281","url":null,"abstract":"<p>The past decade has seen an emerging interest in mapping (aligning or linking) test scores to language proficiency levels of external performance scales or frameworks, such as the Common European Framework of Reference (CEFR), as well as locally developed frameworks, such as China's Standards of English Language Ability (CSE). Such alignment is ultimately a claim about the interpretation of test scores in relation to external levels of language proficiency. To support such a claim, established procedures should be carefully implemented and multiple sources of evidence should be collected. In this research report, we demonstrate the application of a series of steps in building an argument for aligning the scores of the <i>TOEFL iBT®</i> test, an international, large-scale language proficiency test of English as a foreign language (EFL), to the levels of the CSE. The alignment process comprised the following steps: (a) establishing construct congruence between the TOEFL iBT test and the CSE; (b) establishing recommended minimum test scores (cut scores), set by local experts, to classify language learners into the local proficiency levels; (c) collection of scores by test takers (<i>N</i> = 1,326) and evaluations of the test takers' proficiency levels by their teachers, based on the local framework; and (d) consideration of the results of other alignment studies in the local context as well as the link between the CEFR and the CSE levels. We conclude with a discussion of the contextual issues that should be considered when interpreting test scores in relation to external proficiency levels. These contextual issues are important considerations because they have the potential to impact score-based decisions on individuals and institutions. We also discuss the implications for similar alignment research.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":"1-49"},"PeriodicalIF":0.0,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12281","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49324009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Measure Matters: Examining Achievement Gaps on Cognitively Demanding Reading and Mathematics Assessments","authors":"Marisol J. C. Kevelson","doi":"10.1002/ets2.12278","DOIUrl":"10.1002/ets2.12278","url":null,"abstract":"<p>This study presents estimates of Black–White, Hispanic–White, and income achievement gaps using data from two different types of reading and mathematics assessments: constructed-response assessments that were likely more cognitively demanding and state achievement tests that were likely less cognitively demanding (i.e., composed solely or largely of multiple-choice items). Specifically, the study utilized multilevel modeling of data from over 25,000 fourth- through eighth-grade students participating in the 6-state Measures of Effective Teaching (MET) study of 2009–2010, including data from the state reading and mathematics achievement tests used in MET districts at that time and data from the Stanford Achievement Test Open-Ended Reading Assessment (SAT-9OE) and the Balanced Assessment of Mathematics (BAM). The latter two assessments, consisting entirely of constructed-response items, were selected by MET researchers to assess learning outcomes, such as those included in the Common Core State Standards, deemed more cognitively complex than those assessed by state achievement tests at the time. The investigator found that estimated Black–White, Hispanic–White, and income achievement gaps were smaller on the SAT-9OE than on state reading assessments, before accounting for other relevant factors. Estimates of Black–White and Hispanic–White mathematics achievement gaps were slightly larger using BAM data, whereas the estimated income achievement gap was slightly smaller using BAM data. In later models, prior student academic achievement and average student subject-specific prior achievement accounted for portions of these estimated achievement gaps.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":"1-28"},"PeriodicalIF":0.0,"publicationDate":"2019-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44736035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error Variance in Common Population Linking Bridge Studies","authors":"Paul A. Jewsbury","doi":"10.1002/ets2.12279","DOIUrl":"10.1002/ets2.12279","url":null,"abstract":"<p>When an assessment undergoes changes to the administration or instrument, bridge studies are typically used to try to ensure comparability of scores before and after the change. Among the most common and powerful is the common population linking design, with the use of a linear transformation to link scores to the metric of the original assessment. In the common population linking design, randomly equivalent samples receive the new and previous administration or instrument. However, conventional procedures to estimate error variances are not appropriate for scores linked in a bridge study, because the procedures neglect variance due to linking. A convenient approach is to estimate a variance component associated with the linking to add to the conventionally estimated error variance. Equations for the variance components in this approach are derived, and the approximations inherently made in this approach are shown and discussed. Exact error variances of linked scores, accounting for both conventional sources of variance (e.g., sampling) and linking variance together, are derived and discussed. The consequences of how linking changes how certain errors are related is considered mathematically. Specifically, the impacts of linking on the error variance for the comparison of two linked estimates (e.g., comparing the mean score of boys to the mean score of girls, after linking), for the comparison of scores across the two samples (e.g., comparing the mean score of boys in the new administration or instrument to the mean score of boys in the old administration or instrument), and for aggregating scores across the two samples (e.g., the mean score of boys across both administrations or instruments) are derived and discussed. Finally, general methods to account for error variance in bridge studies by simultaneously accounting for both conventional and linking sources of error are recommended.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":"1-31"},"PeriodicalIF":0.0,"publicationDate":"2019-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12279","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45583282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heather M. Buzick, Anna Rhoad-Drogalis, Cara C. Laitusis, Teresa C. King
{"title":"Teachers' Views of Their Practices Related to Common Core State Standards-Aligned Assessments","authors":"Heather M. Buzick, Anna Rhoad-Drogalis, Cara C. Laitusis, Teresa C. King","doi":"10.1002/ets2.12277","DOIUrl":"10.1002/ets2.12277","url":null,"abstract":"<p>A fundamental claim for Common Core State Standards (CCSS)-aligned assessments is that they will lead to better teaching practices. The purpose of this study is to seek evidence in support of this claim by surveying teachers about their instructional practices, test preparation strategies, and test score use both before and after the introduction of CCSS-aligned assessments. Baseline and trend data were collected via five Web-based surveys, administered over 2 years to elementary and middle school English language arts and mathematics teachers in one state, New Jersey. Responses to the first three surveys (<i>n</i><sub>1</sub> = 402 teachers, <i>n</i><sub>2</sub> = 469 teachers, and <i>n</i><sub>3</sub> = 175 teachers from 4% to 6% of New Jersey schools) are summarized and described; results from the remaining surveys are omitted due to low response. Challenges to collecting empirical evidence in support of the validity argument and theory of action for a new assessment are discussed.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":"1-18"},"PeriodicalIF":0.0,"publicationDate":"2019-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12277","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43576203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiangang Hao, Lei Liu, Patrick Kyllonen, Michael Flor, Alina A. von Davier
{"title":"Psychometric Considerations and a General Scoring Strategy for Assessments of Collaborative Problem Solving","authors":"Jiangang Hao, Lei Liu, Patrick Kyllonen, Michael Flor, Alina A. von Davier","doi":"10.1002/ets2.12276","DOIUrl":"10.1002/ets2.12276","url":null,"abstract":"<p>Collaborative problem solving (CPS) is an important 21st-century skill that is crucial for both career and academic success. However, developing a large-scale and standardized assessment of CPS that can be administered on a regular basis is very challenging. In this report, we introduce a set of psychometric considerations and a general scoring strategy around assessing CPS, summarized based on the results of the extensive empirical studies we conducted at Educational Testing Service (ETS) over the past 6 years. Using the ETS Collaborative Science Assessment Prototype as an example, we show how these psychometric considerations have been incorporated into the development of the assessment prototype and how the scoring strategy has been implemented.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":"1-17"},"PeriodicalIF":0.0,"publicationDate":"2019-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12276","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43337444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distractor Analysis for Multiple-Choice Tests: An Empirical Study With International Language Assessment Data","authors":"Shelby J. Haberman, Yang Liu, Yi-Hsuan Lee","doi":"10.1002/ets2.12275","DOIUrl":"10.1002/ets2.12275","url":null,"abstract":"<p>Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect distractors, and (c) a model in which the item score satisfies a two-parameter logistic model and distractor selection and proficiency are conditionally independent, given that an incorrect response is selected. Model comparisons involve generalized residuals, information measures, scale scores, and reliability estimates. To illustrate the methodology, a study of an international assessment of proficiency of nonnative speakers of a single target language used to make high-stakes decisions compares the models under study.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2019 1","pages":"1-16"},"PeriodicalIF":0.0,"publicationDate":"2019-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/ets2.12275","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43520147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}