{"title":"Educational Measurement: Models, Methods, and Theory","authors":"Lauress L. Wise, Daisy W. Rutstein","doi":"10.1111/emip.12642","DOIUrl":"https://doi.org/10.1111/emip.12642","url":null,"abstract":"<p>This article describes an amazing development of methods and models supporting educational measurement together with a much slower evolution of theory about how and what students learn and how educational measurement best supports that learning. Told from the perspective of someone who has lived through many of these changes, the article provides background on these developments and insights into challenges and opportunities for future development.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"83-87"},"PeriodicalIF":2.7,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measurement Must Be Qualitative, then Quantitative, then Qualitative Again","authors":"Andrew D. Ho","doi":"10.1111/emip.12662","DOIUrl":"https://doi.org/10.1111/emip.12662","url":null,"abstract":"<p>Educational measurement is a social science that requires both qualitative and quantitative competencies. Qualitative competencies in educational measurement include developing and applying theories of learning, designing instruments, and identifying the social, cultural, historical, and political contexts of measurement. Quantitative competencies include statistical inference, computational fluency, and psychometric modeling. I review 12 commentaries authored by past presidents of the National Council on Measurement in Education (NCME) published in a special issue prompting them to reflect on the past, present, and future of educational measurement. I explain how a perspective on both qualitative and quantitative competencies yields common themes across the commentaries. These include the appeal and challenge of personalization, the necessity of contextualization, and the value of communication and collaboration. I conclude that elevation of both qualitative and quantitative competencies underlying educational measurement provides a clearer sense of how NCME can advance its mission, “to advance theory and applications of educational measurement to benefit society.”</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"137-145"},"PeriodicalIF":2.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Admission Testing in Higher Education: Changing Landscape and Outcomes from Test-Optional Policies","authors":"Wayne Camara","doi":"10.1111/emip.12651","DOIUrl":"https://doi.org/10.1111/emip.12651","url":null,"abstract":"<p>Access to admission tests was greatly restricted during the COVID-19 pandemic resulting in widespread adoption of test-optional policies by colleges and universities. Many institutions adopted such policies on an interim or trial basis, as many others signaled the change would be long term. Several Ivy League institutions and selective public flagship universities have returned to requiring test scores from all applicants citing their own research indicating diversity and ensuring academic success of applicants can be best served by inclusion of test scores in the admissions process. This paper reviews recent research on the impact of test-optional policies on score-sending behaviors of applicants and differential outcomes in college and score sending. Ultimately, test-optional policies are neither the panacea for diversity that proponents suggested nor do they result in a decay of academic outcomes that opponents forecast, but they do have consequences, which colleges will need to weigh going forward.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"104-111"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leading ITEMS: A Retrospective on Progress and Future Goals","authors":"Brian C. Leventhal","doi":"10.1111/emip.12661","DOIUrl":"https://doi.org/10.1111/emip.12661","url":null,"abstract":"<p>As this issue marks the conclusion of my tenure as editor of the Instructional Topics in Educational Measurement Series (ITEMS), I take this opportunity to reflect on the progress made during my term and to outline potential future directions for the publication.</p><p>First, I extend my gratitude to the National Council on Measurement in Education (NCME) and the publications committee for entrusting me with the role of editor and for their unwavering support of my vision for ITEMS. I am also deeply appreciative of Richard Feinberg, who served as associate editor throughout my tenure, and Zhongmin Cui, editor of <i>Educational Measurement: Issues and Practice</i> (<i>EM:IP</i>) for their invaluable collaboration. Additionally, I thank all the authors who contributed modules and the dedicated readership that has engaged with the content.</p><p>ITEMS stands as a distinctive publication, bridging the gap between research and education by offering learning modules on both emerging and established practice in educational measurement. I saw the primary objective of ITEMS as to provide accessible learning resources to a diverse audience, including practitioners, students, partners, stakeholders, and the general public. These modules serve various purposes; practitioners may seek to research or expand their skills, students and professors may use them to complement classroom learning, partners and stakeholders may develop foundational knowledge to enhance collaboration with measurement professionals, and the public may gain insights into tests they encounter in their daily lives. Addressing the needs of such a broad audience is challenging, yet it underscores the essential role that ITEMS plays.</p><p>Upon assuming the role of editor three years ago, ITEMS had recently transitioned from static articles to interactive digital modules. My efforts focused on furthering this transformation by enhancing the engagement of digital publications and streamlining the development process. Although much of this work occurred behind the scenes, the benefits are evident to learners. The modules are now easily accessible on the NCME website, available in both digital and print formats. Newer modules include downloadable videos for offline use or course integration. Content is now accessible across multiple devices, including computers, phones and tablets. Authors also benefit from the updated development process, which now uses familiar software such as Microsoft PowerPoint or Google Slides. Comprehensive documentation, including timelines, deliverables, and templates, supports authors throughout the development process, allowing them to focus on content creation rather than formatting and logistics.</p><p>Reflecting on my tenure, I am proud of the modules published, yet I recognize areas for improvement and future growth. Recruiting authors and maintaining content development posed significant challenges, with some modules remaining incomplete. I am hopeful that th","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"169"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12661","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Application of Text Embeddings to Support Alignment of Educational Content Standards","authors":"Reese Butterfuss, Harold Doran","doi":"10.1111/emip.12641","DOIUrl":"https://doi.org/10.1111/emip.12641","url":null,"abstract":"<p>Large language models are increasingly used in educational and psychological measurement activities. Their rapidly evolving sophistication and ability to detect language semantics make them viable tools to supplement subject matter experts and their reviews of large amounts of text statements, such as educational content standards. This paper presents an application of text embeddings to find relationhips between different sets of educational content standards in a content mapping process. Content mapping is routinely used by state education agencies and is often a requirement of the United States Department of Education peer review process. We discuss the educational measurement problem, propose a formal methodology, demonstrate an application of our proposed approach, and provide measures of its accuracy and potential to support real-world activities.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"73-83"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Should Psychometricians Know about the History of Testing and Testing Policy?","authors":"Lorrie A. Shepard","doi":"10.1111/emip.12650","DOIUrl":"https://doi.org/10.1111/emip.12650","url":null,"abstract":"<p>In 2023, a National Council on Measurement in Education Presidential Task Force developed a consensus framework for foundational competencies in educational measurement to guide graduate programs and subsequent professional development. This article elaborates on the social, cultural, historical, and political context subdomain from that framework. A graduate course on the history of testing and testing policy in the United States is proposed to help measurement professionals develop an understanding of historic belief systems and theories of action that affect every aspect of testing applications—definition of constructs, instrument design, respondents’ interactions, interpretations and use of results, and both intended and unintended consequences. Two, accessible, key readings are proposed for each of 14 weeks addressing the following topics: IQ testing and deficit perspectives; special education placements, disproportionality, and accommodations; grade retention and tracking; college admissions testing; standards-based reforms; 1990s performance assessment innovations; NCLB and school accountability; achievement gaps and opportunity to learn; NAEP and international assessments; standard setting and NAEP achievement levels; Common Core State Standards and ESSA; formative assessment and research on learning; culturally responsive assessment.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"46-61"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12650","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deborah J. Harris, Catherine J. Welch, Stephen B. Dunbar
{"title":"In the beginning, there was an item…","authors":"Deborah J. Harris, Catherine J. Welch, Stephen B. Dunbar","doi":"10.1111/emip.12647","DOIUrl":"https://doi.org/10.1111/emip.12647","url":null,"abstract":"<p>As educational researchers, we take scored item responses, create data sets to analyze, draw inferences from those analyses, and make decisions, about students’ educational knowledge and future success, judge how successful educational programs are, determine what to teach tomorrow, and so on. It is good to remind ourselves that the basis for all our analyses, from simple means to complex multilevel, multidimensional modeling, interpretations of those analyses, and decisions we make based on the analyses are at the core based on a test taker responding to an item. With all the emphasis on modeling, analyses, big data, machine learning, etc., we need to remember it all starts with the items we collect information on. If we get those wrong, then the results of subsequent analyses are unlikely to provide the information we are seeking.</p><p>It is true that how students and educators interact with items has changed, and continues to change. More and more of the student-item interactions are happening online, and the days when an educator had relatively easy access to the actual test items, often after test administration, are in the past. This lack of access is also true for the researchers analyzing the response data: instead of a single test booklet aligned to a data file of test taker responses, there are large pools of items, and while the researcher may know a test taker was administered, say, item #SK-65243-0273A and what the response was, they do not know what the text of the item actually was, which can make it challenging to interpret analysis results at times.</p><p>From having a test author write the items for an assessment, to contracting with content specialists to draft items, to cloning items from a template, to having large language models/artificial intelligence produce items, item development has morphed over the past and present, and will continue to morph into the future. Item tryouts for pretesting the quality and functioning of an item, including gathering data for generating item statistics to aid in forms construction and in some instances scoring, now attempt to develop algorithms that can accurately predict item characteristics, including item statistics, without gathering item data in advance of operational use (or at all). We are developing more innovative item types, and collecting more data, such as latencies, click streams, and other process data on student responses to those items.</p><p>Sometimes we are so enamored of what we can do with the data, the analyses seem distant from the actual experience: a test taker responding to an item. And this makes it challenging at times to interpret analysis results in terms of actionable steps. Our aim here is to examine the evolution of how items are developed and considered, concentrating on large-scale, K–12 educational assessments.</p><p>The <i>Standards for Educational and Psychological Testing</i> (<i>Standards</i>; American Educational Research Association [AERA], the ","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"40-45"},"PeriodicalIF":2.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12647","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measurement Invariance for Multilingual Learners Using Item Response and Response Time in PISA 2018","authors":"Jung Yeon Park, Sean Joo, Zikun Li, Hyejin Yoon","doi":"10.1111/emip.12640","DOIUrl":"https://doi.org/10.1111/emip.12640","url":null,"abstract":"<p>This study examines potential assessment bias based on students' primary language status in PISA 2018. Specifically, multilingual (MLs) and nonmultilingual (non-MLs) students in the United States are compared with regard to their response time as well as scored responses across three cognitive domains (reading, mathematics, and science). Differential item functioning (DIF) analysis reveals that 7–14% of items exhibit DIF-related problems in scored responses between the two groups, aligning with PISA technical report results. While MLs generally spend more time on the test than non-MLs across cognitive levels, differential response time (DRT) functioning identifies significant time differences in 7–10% of items for students with similar cognitive levels. It was noticeable that items with DIF and DRT issues show limited overlap, suggesting diverse reasons for student struggles in the assessment. A deeper examination of item characteristics is recommended for test developers and teachers to gain a better understanding of these nuances.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"55-65"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12640","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"You Win Some, You Lose Some","authors":"Gregory J. Cizek","doi":"10.1111/emip.12643","DOIUrl":"https://doi.org/10.1111/emip.12643","url":null,"abstract":"<p>In a 1993 EM:IP article, I made six predictions related to measurement policy issues for the approaching millenium. In this article, I evaluate the accuracy of those predictions (Spoiler: I was only modestly accurate) and I proffer a mix of seven contemporary predictions, recommendations, and aspirations regarding assessment generally, NCME as an association, and specific psychometric practices.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"126-136"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143245272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}