M. Wolf, J. Herman, Jinok S. Kim, J. Abedi, Seth Leon, Noelle C. Griffin, Patina L. Bachman, Sandy Chang, Tim Farnsworth, Hyekyung Jung, J. Nollner, H. Shin
{"title":"Providing Validity Evidence to Improve the Assessment of English Language Learners. CRESST Report 738.","authors":"M. Wolf, J. Herman, Jinok S. Kim, J. Abedi, Seth Leon, Noelle C. Griffin, Patina L. Bachman, Sandy Chang, Tim Farnsworth, Hyekyung Jung, J. Nollner, H. Shin","doi":"10.1037/e643102011-001","DOIUrl":"https://doi.org/10.1037/e643102011-001","url":null,"abstract":"This research project addresses the validity of assessments used to measure the performance of English language learners (ELLs), such as those mandated by the No Child Left Behind Act of 2001 (NCLB, 2002). The goals of the research are to help educators understand and improve ELL performance by investigating the validity of their current assessments, and to provide states with much needed guidance to improve the validity of their English language proficiency (ELP) and academic achievement assessments for ELL students. The research has three phases. In the first phase, the researchers analyze existing data and documents to understand the nature and validity of states’ current practices and their priority needs. This first phase is exploratory in that the researchers identify key validity issues by examining the existing data and formulate research areas where further investigation is needed for the second phase. In the second phase of the research, the researchers will deepen their analysis of the areas identified from Phase I findings. In the third phase of the research, the researchers will develop specific guidelines on which states may base their ELL assessment policy and practice. The present report focuses on the researchers’ Phase I research activities and results. The report also discusses preliminary implications and recommendations for improving ELL assessment systems. 1 We would like to thank Lyle Bachman, Alison Bailey, Frances Butler, Diane August, and Guillermo SolanoFlores for their valuable comments on earlier drafts of this report. We are also very grateful to our three participating states for their willingness to share their data and support of our work.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2008-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75331741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Wolf, J. Herman, Lyle F. Bachman, A. Bailey, Noelle C. Griffin
{"title":"Recommendations for Assessing English Language Learners: English Language Proficiency Measures and Accommodation Uses. Recommendations Report (Part 3 of 3). CRESST Report 737.","authors":"M. Wolf, J. Herman, Lyle F. Bachman, A. Bailey, Noelle C. Griffin","doi":"10.1037/e643112011-001","DOIUrl":"https://doi.org/10.1037/e643112011-001","url":null,"abstract":"The No Child Left Behind Act of 2001 (NCLB, 2002) has had a great impact on states’ policies in assessing English language learner (ELL) students. The legislation requires states to develop or adopt sound assessments in order to validly measure the ELL students’ English language proficiency, as well as content knowledge and skills. While states have moved rapidly to meet these requirements, they face challenges to validate their current assessment and accountability systems for ELL students, partly due to the lack of resources. Considering the significant role of assessment in guiding decisions about organizations and individuals, validity is a paramount concern. In light of this, we reviewed the current literature and policy regarding ELL assessment in order to inform practitioners of the key issues to consider in their validation process. Drawn from our review of literature and practice, we developed a set of guidelines and recommendations for practitioners to use as a resource to improve their ELL assessment systems. The present report is the last component of the series, providing recommendations for state policy and practice in assessing ELL students. It also discusses areas for future research and development. Introduction and Background English language learners (ELLs) are the fastest growing subgroup in the nation. Over a 10-year period between the 1994–1995 and 2004–2005 school years, the enrollment of ELL students grew over 60%, while the total K–12 growth was just over 2% (Office of English Language Acquisition [OELA], n.d.). The increased rate is more astounding for some states. For instance, North Carolina and Nevada have reported their ELL population growth rate as 500% and 200% respectively for the past 10-year period (Batlova, Fix, & Murray, 2005, as cited in Short & Fitzsimmons, 2007). Not only is the size of the ELL population is growing, but the diversity of these students is becoming more extensive. Over 400 different languages are reported among these students; schooling experience is varied depending on the students’ 1 We would like to thank the following for their valuable comments and suggestions on earlier drafts of this report: Jamal Abedi, Diane August, Margaret Malone, Robert J. Mislevy, Charlene Rivera, Lourdes Rovira, Robert Rueda, Guillermo Solano-Flores, and Lynn Shafer Willner. Our sincere thanks also go to Jenny Kao, Patina L. Bachman, and Sandy M. Chang for their useful suggestions and invaluable research assistance.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2008-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83351697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Templates and Objects in Authoring Problem-Solving Assessments. CRESST Report 735.","authors":"Terry P. Vendlinski, E. Baker, D. Niemi","doi":"10.4324/9781315096773-16","DOIUrl":"https://doi.org/10.4324/9781315096773-16","url":null,"abstract":"","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2008-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81988123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Wolf, Jenny C. Kao, J. Herman, Lyle F. Bachman, A. Bailey, Patina L. Bachman, Tim Farnsworth, Sandy Chang
{"title":"Issues in Assessing English Language Learners: English Language Proficiency Measures and Accommodation Uses. Literature Review (Part 1 of 3). CRESST Report 731.","authors":"M. Wolf, Jenny C. Kao, J. Herman, Lyle F. Bachman, A. Bailey, Patina L. Bachman, Tim Farnsworth, Sandy Chang","doi":"10.1037/e643592011-001","DOIUrl":"https://doi.org/10.1037/e643592011-001","url":null,"abstract":"The No Child Left Behind (NCLB) Act has made a great impact on states’ policies in assessing English language learner (ELL) students. The legislation requires states to develop or adopt sound assessments in order to validly measure the ELL students’ English language proficiency (ELP), as well as content knowledge and skills. Although states have moved rapidly to meet these requirements, they face challenges to validate their current assessment and accountability systems for ELL students, partly due to the lack of resources. Considering the significant role of assessments in guiding decisions about organizations and individuals, it is of paramount importance to establish a valid assessment system. In light of this, we reviewed the current literature and policy regarding ELL assessment in order to inform practitioners of the key issues to consider in their validation processes. Drawn from our review of literature and practice, we developed a set of guidelines and recommendations for practitioners to use as a resource to improve their ELL assessment systems. We have compiled a series of three reports. The present report is the first component of the series, containing pertinent literature related to assessing ELL students. The areas being reviewed include validity theory, the construct of ELP assessments, and the effects of accommodations in the assessment of ELL students’ content knowledge.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80074697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Creating Accurate Science Benchmark Assessments to Inform Instruction. CSE Technical Report 730.","authors":"Terry P. Vendlinski, Sam O. Nagashima, J. Herman","doi":"10.1037/e643602011-001","DOIUrl":"https://doi.org/10.1037/e643602011-001","url":null,"abstract":"Current educational policy highlights the important role that assessment can play in improving education. State standards and the assessments that are aligned with them establish targets for learning and promote school accountability for helping all students succeed; at the same time, feedback from assessment results is expected to provide districts, schools, and teachers with important information for guiding instructional planning and decision making. Yet even as No Child Left Behind (NCLB) and its requirements for adequate yearly progress put unprecedented emphasis on state tests, educators have discovered that annual state tests are too little and too late to guide teaching and learning. Recognizing the need for more frequent assessments to support student learning, many districts and schools have turned to benchmark testing—periodic assessments through which districts can monitor students’ progress, and schools and teachers can refine curriculum and teaching—to help students succeed. We report in this document a collaborative effort of teachers, district administrators, professional developers, and assessment researchers to develop benchmark assessments for elementary school science. In the sections which follow we provide the rationale for our work and its research question, describe our collaborative assessment development process and its results, and present conclusions.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"107 9‐12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91418813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Franke, N. Webb, Angela G. Chan, Dan Battey, Marsha Ing, Deanna Freund, Tondra De
{"title":"Eliciting Student Thinking in Elementary School Mathematics Classrooms. CRESST Report 725.","authors":"M. Franke, N. Webb, Angela G. Chan, Dan Battey, Marsha Ing, Deanna Freund, Tondra De","doi":"10.1037/e643702011-001","DOIUrl":"https://doi.org/10.1037/e643702011-001","url":null,"abstract":"","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84694907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eva Chen, D. Niemi, Jia Wang, Haiwen Wang, J. Mirocha
{"title":"Examining the Generalizability of Direct Writing Assessment Tasks. CSE Technical Report 718.","authors":"Eva Chen, D. Niemi, Jia Wang, Haiwen Wang, J. Mirocha","doi":"10.1037/e643812011-001","DOIUrl":"https://doi.org/10.1037/e643812011-001","url":null,"abstract":"This study investigated the level of generalizability across a few high quality assessment tasks and the validity of measuring student writing ability using a limited number of essay tasks. More specifically, the research team explored how well writing prompts could measure student general writing ability and if student performance from one writing task could be generalized to other similar writing tasks. A total of four writing prompts were used in the study, with three tasks being literature-based and one task based on a short story. A total of 397 students participated in the study and each student was randomly assigned to complete two of the four tasks. The research team found that three to five essays were required to evaluate and make a reliable judgment of student writing performance. Examining the Generalizability of Direct Writing Assessment Tasks Performance assessment can serve to measure important and complex learning outcomes (Resnick & Resnick, 1989), provide a more direct measurement of student ability (Frederiksen, 1984; Glaser, 1991; Guthrie, 1984), and help guide improvement in instructional practices (Baron, 1991; Bennett, 1993). Of the various types of performance assessment, direct tests of writing ability have experienced the most acceptance in state and national assessment programs (Afflebach, 1985; Applebee, Langer, Jenkins, Mullins & Foertsch, 1990; Applebee, Langer, & Mullis, 1995). Advocates of direct writing assessment point out that students need more exposure to writing in the form of instruction and more frequent examinations (Breland, 1983). However, there are problems associated with using essays to measure students’ writing abilities, like objectivity of ratings, generalizability of scores across raters and tasks (Crehan, 1997). Previous generalizability studies of direct writing assessment","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85164680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bailey, Becky H. Huang, H. Shin, Tim Farnsworth, Frances A. Butler
{"title":"Developing Academic English Language Proficiency Prototypes for 5th Grade Reading: Psychometric and Linguistic Profiles of Tasks. An Extended Executive Summary. CSE Report 720.","authors":"A. Bailey, Becky H. Huang, H. Shin, Tim Farnsworth, Frances A. Butler","doi":"10.1037/e643792011-001","DOIUrl":"https://doi.org/10.1037/e643792011-001","url":null,"abstract":"","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84338137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"School Improvement under Test-Driven Accountability: A Comparison of High- and Low-Performing Middle Schools in California. CSE Report 717.","authors":"H. Mintrop, Tina Trujillo","doi":"10.1037/e643832011-001","DOIUrl":"https://doi.org/10.1037/e643832011-001","url":null,"abstract":"Based on in-depth data from nine demographically similar schools, the study asks five questions in regard to key aspects of the improvement process and that speak to the consequential validity of accountability indicators: Do schools that differ widely according to system performance criteria also differ on the quality of the educational experience they provide to students? Are schools that have posted high growth on the state’s performance index more effective organizationally? Do high-performing schools respond more productively to the messages of their state accountability system? Do highand low-performing schools exhibit different approaches to organizational learning and teacher professionalism? Is district instructional management in an aligned state accountability system related to performance? We report our findings in three results papers1 (Mintrop & Trujillo, 2007a, 2007b; Trujillo & Mintrop, 2007) and this technical report. The results papers, in a nutshell, show that, across the nine case study schools, one positive performance outlier differed indeed in the quality of teaching, organizational effectiveness, response to accountability, and patterns of organizational learning. Across the other eight schools, however, the patterns blurred. We conclude that, save for performance differences on the extreme positive and negative margins, relationships between system-designated performance levels and improvement processes on the ground are uncertain and far from solid. The papers try to elucidate why this may be so. This final technical report summarizes the major components of the study design and methodology, including case selection, instrumentation, data collection, and data analysis techniques. We describe the context of the study as well as descriptive data on our cases and procedures. School improvement is an intricate business. Whether a school succeeds in improving is dependent on a host of factors. Factors come into play that are internal and external to the organization. The motivation and capacity of the workforce, the 1 The three reports are entitled Accountability Urgency, Organizational Learning, and Educational Outcomes: A Comparative Analysis of California Middle Schools; The Practical Relevance of Accountability Systems for School Improvement: A Descriptive Analysis of California Schools; and Centralized Instructional Management: District Control, Organizational Culture, and School Performance.","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"112 2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91024258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Changes in the Black-White Test score Gap in the Elementary School Grades. CSE Report 715.","authors":"D. Koretz, Y. Kim","doi":"10.1037/e643902011-001","DOIUrl":"https://doi.org/10.1037/e643902011-001","url":null,"abstract":"In a pair of recent studies, Fryer and Levitt (2004a, 2004b) analyzed the Early Childhood Longitudinal Study – Kindergarten Cohort (ECLS-K) to explore the characteristics of the Black-White test score gap in young children. They found that the gap grew markedly between kindergarten and the third grade and that they could predict the gap from measured characteristics in kindergarten but not in the third grade. In addition, they found that the widening of the gap was differential across areas of knowledge and skill, with Blacks falling behind in all areas other than the most basic. They raised the possibility that Black and Whites may not be on “parallel trajectories” and that Blacks, as they go through school, may never master some skills mastered by Whites. This study re-analyzes the ECLS-K data to address this last question. We find that the scores used by Fryer and Levitt (proficiency probability scores, or PPS) do not support the hypothesis of differential growth of the gap. The patterns they found reflect the nonlinear relationships between overall proficiency, θ , and the PPS variables, as well as ceiling effects in the PPS distributions. Moreover, θ is a sufficient statistic for the PPS variables, and therefore, PPS variables merely re-express the overall mean difference between groups and contain no information about qualitative differences in performance between Black and White students at similar levels of θ . We therefore carried out differential item functioning (DIF) analyses of all items in all rounds of the ECLS-K through grade 5 (Round 6), excluding only the fall of grade 1 (which was a very small sample) and subsamples in which there were too few Black students for reasonable analysis. We found no relevant patterns in the distribution of the DIF statistics or in the characteristics of the items showing DIF that support the notion of differential divergence, other than in kindergarten and the first grade, where DIF favoring Blacks tended to be on items tapping simple skills taught outside of school (e.g., number recognition), while DIF disfavoring Blacks tended to be on material taught more in school (e.g., arithmetic). However, there were exceptions to this. Moreover, because of its construction and reporting, the ECLS-K data were not ideal for addressing this 1Young-Suk Kim is currently at the Florida Center for Reading Research (FCRR) and Department of Childhood Education, Reading, and Disability Services, College of Education, Florida State University","PeriodicalId":19116,"journal":{"name":"National Center for Research on Evaluation, Standards, and Student Testing","volume":"89 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2007-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85838700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}