{"title":"Introduction to the Special Section on the Past, Present, and Future of Educational Measurement","authors":"Zhongmin Cui","doi":"10.1111/emip.12660","DOIUrl":"https://doi.org/10.1111/emip.12660","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"38-39"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI: Can You Help Address This Issue?","authors":"Deborah J. Harris","doi":"10.1111/emip.12655","DOIUrl":"https://doi.org/10.1111/emip.12655","url":null,"abstract":"<p>Linking across test forms or pools of items is necessary to ensure scores that are reported across different administrations are comparable and lead to consistent decisions for examinees whose abilities are the same, but who were administered different items. Most of these linkages consist of equating test forms or scaling calibrated items or pools to be on the same theta scale. The typical methodology to accomplish this linking makes use of common examinees or common items, where common examinees are understood to be groups of examinees of comparable ability, whether obtained through a single group (where the same examinees are administered multiple assessments) or a random groups design, where random assignment or pseudo random assignment is done (such as spiraling the test forms, say 1, 2, 3, 4, 5, and distributing them such that every 5th examinee receives the same form). Common item methodology is usually implemented by having identical items in multiple forms and using those items to link across forms or pools. These common items may be scored or unscored in terms of whether they are treated as internal or external anchors (i.e., whether they are contributing to the examinee's score).</p><p>There are situations where it is not practical to have either common examinees nor common items. Typically, these are high-stakes settings, where the security of the assessment questions would likely be at risk if any were repeated. This would include scenarios where the entire assessment is released after administration to promote transparency. In some countries, a single form of a national test may be administered to all examinees during a single administration time. While in some cases a student who does not do as well as they had hoped may retest the following year, this may be a small sample and these students would not be considered representative of the entire body of test-takers. In addition, it is presumed they would have spent the intervening year studying for the exam, and so they could not really be considered common examinees across years and assessment forms.</p><p>Although the decisions (such as university admissions) based on the assessment scores are comparable within the year, because all examinees are administered the same set of items on the same date, it is difficult to monitor trends over time as there is no linkage between forms across years. Although the general populations may be similar (e.g., 2024 secondary school graduates versus 2023 secondary school graduates), there is no evidence that the groups are strictly equivalent across years. Similarly, comparing how examinees perform across years (e.g., highest scores, average raw score, and so on) is challenging as there is no adjustment for yearly fluctuations in form difficulty across years.</p><p>There have been variations of both common item and common examinee linking, such as using similar items, rather than identical items, including where perhaps these similar items are","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"9-12"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12655","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephen G. Sireci, Javier Suárez-Álvarez, April L. Zenisky, Maria Elena Oliveri
{"title":"Evolving Educational Testing to Meet Students’ Needs: Design-in-Real-Time Assessment","authors":"Stephen G. Sireci, Javier Suárez-Álvarez, April L. Zenisky, Maria Elena Oliveri","doi":"10.1111/emip.12653","DOIUrl":"https://doi.org/10.1111/emip.12653","url":null,"abstract":"<p>The goal in personalized assessment is to best fit the needs of each individual test taker, given the assessment purposes. Design-In-Real-Time (DIRTy) assessment reflects the progressive evolution in testing from a single test, to an adaptive test, to an adaptive assessment <i>system</i>. In this article, we lay the foundation for DIRTy assessment and illustrate how it meets the complex needs of each individual learner. The assessment framework incorporates culturally responsive assessment principles, thus making it innovative with respect to both technology and equity. Key aspects are (a) assessment building blocks called “assessment task modules” (ATMs) linked to multiple content standards and skill domains, (b) gathering information on test takers’ characteristics and preferences and using this information to improve their testing experience, and (c) selecting, modifying, and compiling ATMs to create a personalized test that best meets the needs of the testing purpose and individual test taker.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"112-118"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Analysis of Psychometric Frameworks and Properties of Scores from Autogenerated Test Forms","authors":"Won-Chan Lee, Stella Y. Kim","doi":"10.1111/emip.12648","DOIUrl":"https://doi.org/10.1111/emip.12648","url":null,"abstract":"<p>This paper explores the psychometric properties of scores derived from autogenerated test forms by introducing three conceptual frameworks: Alternate Test Forms, Randomly Parallel Forms, and Approximately Parallel Forms. Each framework provides a distinct perspective on score comparability, definitions of true score and standard error of measurement (SEM), and the necessity of equating. Through a simulation study, we illustrate how these frameworks compare in terms of true scores and SEMs, while also assessing the impact of equating on score comparability across varying levels of form variability. Ultimately, this study seeks to lay the groundwork for implementing scoring practices in large-scale standardized assessments that use autogenerated forms.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"13-23"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12648","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linking Unlinkable Tests: A Step Forward","authors":"Silvia Testa, Renato Miceli, Renato Miceli","doi":"10.1111/emip.12638","DOIUrl":"https://doi.org/10.1111/emip.12638","url":null,"abstract":"<p>Random Equating (RE) and Heuristic Approach (HA) are two linking procedures that may be used to compare the scores of individuals in two tests that measure the same latent trait, in conditions where there are no common items or individuals. In this study, RE—that may only be used when the individuals taking the two tests come from the same population—was used as a benchmark for evaluating HA, which, in contrast, does not require any distributional assumptions. The comparison was based on both simulated and empirical data. Simulations showed that HA was good at reproducing the link shift connecting the difficulty parameters of the two sets of items, performing similarly to RE under the condition of slight violation of the distributional assumption. Empirical results showed satisfactory correspondence between the estimates of item and person parameters obtained via the two procedures.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"66-72"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyndra V. Middleton, Comfort H. Omonkhodion, Ernest Y. Amoateng, Lucy O. Okam, Daniela Cardoza, Alexis Oakley
{"title":"From Mandated to Test-Optional College Admissions Testing: Where Do We Go from Here?","authors":"Kyndra V. Middleton, Comfort H. Omonkhodion, Ernest Y. Amoateng, Lucy O. Okam, Daniela Cardoza, Alexis Oakley","doi":"10.1111/emip.12649","DOIUrl":"https://doi.org/10.1111/emip.12649","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"33-37"},"PeriodicalIF":2.7,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Approaches to Controlling Item Position Effects in Computerized Adaptive Tests","authors":"Ye Ma, Deborah J. Harris","doi":"10.1111/emip.12637","DOIUrl":"https://doi.org/10.1111/emip.12637","url":null,"abstract":"<p>Item position effect (IPE) refers to situations where an item performs differently when it is administered in different positions on a test. The majority of previous research studies have focused on investigating IPE under linear testing. There is a lack of IPE research under adaptive testing. In addition, the existence of IPE might violate Item Response Theory (IRT)’s item parameter invariance assumption, which facilitates applications of IRT in various psychometric tasks such as computerized adaptive testing (CAT). Ignoring IPE might lead to issues such as inaccurate ability estimation in CAT. This article extends research on IPE by proposing and evaluating approaches to controlling position effects under an item-level computerized adaptive test via a simulation study. The results show that adjusting IPE via a pretesting design (approach 3) or a pool design (approach 4) results in better ability estimation accuracy compared to no adjustment (baseline approach) and item-level adjustment (approach 2). Practical implications of each approach as well as future research directions are discussed as well.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"44-54"},"PeriodicalIF":2.7,"publicationDate":"2024-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Module 36: Applying Intersectionality Theory to Educational Measurement","authors":"Michael Russell","doi":"10.1111/emip.12622","DOIUrl":"https://doi.org/10.1111/emip.12622","url":null,"abstract":"<div>\u0000 \u0000 <section>\u0000 \u0000 <h3> Module Abstract</h3>\u0000 \u0000 <p>Over the past decade, interest in applying Intersectionality Theory to quantitative analyses has grown. This module examines key concepts that form the foundation of Intersectionality Theory and considers challenges and opportunities these concepts present for quantitative methods. Two examples are presented to demonstrate how an intersectional approach to quantitative analyses differs from a traditional single-axis approach. The first example employs a linear regression technique to examine the efficacy of an educational intervention and to explore whether efficacy differs among subgroups of students. The second example compares findings when a differential item function analysis is conducted in a single-axis manner versus an intersectional lens. The module ends by exploring key considerations analysts and psychometricians encounter when applying Intersectionality Theory to a quantitative analysis.</p>\u0000 </section>\u0000 </div>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"106-108"},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12622","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142404751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine E. Castellano, Daniel F. McCaffrey, Joseph A. Martineau
{"title":"Demystifying Adequate Growth Percentiles","authors":"Katherine E. Castellano, Daniel F. McCaffrey, Joseph A. Martineau","doi":"10.1111/emip.12635","DOIUrl":"https://doi.org/10.1111/emip.12635","url":null,"abstract":"<p>Growth-to-standard models evaluate student growth against the growth needed to reach a future standard or target of interest, such as proficiency. A common growth-to-standard model involves comparing the popular Student Growth Percentile (SGP) to Adequate Growth Percentiles (AGPs). AGPs follow from an involved process based on fitting a series of nonlinear quantile regression models to longitudinal student test score data. This paper demystifies AGPs by deriving them in the more familiar linear regression framework. It further shows that unlike SGPs, AGPs and on-track classifications based on AGPs are strongly related to status. Lastly, AGPs are evaluated in terms of their classification accuracy. An empirical study and analytic derivations reveal AGPs can be problematic indicators of students’ future performance with previously not proficient students being more likely incorrectly flagged as not on-track and previously proficient students as on track. These classification errors have equity implications at the individual and school levels.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"31-43"},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cover: Gendered Trajectories of Digital Literacy Development: Insights from a Longitudinal Cohort Study","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12625","DOIUrl":"https://doi.org/10.1111/emip.12625","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"6"},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12625","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142404750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}