{"title":"Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations","authors":"Hui Jin, Cynthia Lima, Limin Wang","doi":"10.1111/emip.70003","DOIUrl":"https://doi.org/10.1111/emip.70003","url":null,"abstract":"<p>Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models’ language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated scoring was applied to five science items. Results indicate that including item descriptions prior to student responses provides additional contextual information to the transformer model, allowing it to generate automated scoring models with improved performance. These automated scoring models achieved scoring accuracy comparable to human raters. However, they struggle to evaluate responses that contain complex scientific terminology and to interpret responses that contain unusual symbols, atypical language errors, or logical inconsistencies. These findings underscore the importance of the efforts from both researchers and teachers in advancing the accuracy, fairness, and effectiveness of automated scoring.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"25-37"},"PeriodicalIF":1.9,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stephanie Iaccarino, Brian E. Clauser, Polina Harik, Peter Baldwin, Yiyun Zhou, Michael T. Kane
{"title":"Exploring the Effect of Human Error When Using Expert Judgments to Train an Automated Scoring System","authors":"Stephanie Iaccarino, Brian E. Clauser, Polina Harik, Peter Baldwin, Yiyun Zhou, Michael T. Kane","doi":"10.1111/emip.70002","DOIUrl":"https://doi.org/10.1111/emip.70002","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"15-24"},"PeriodicalIF":1.9,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Won-Chan Lee, Stella Y. Kim, Qiao Liu, Seungwon Shin
{"title":"Digital Module 39: Introduction to Generalizability Theory","authors":"Won-Chan Lee, Stella Y. Kim, Qiao Liu, Seungwon Shin","doi":"10.1111/emip.70001","DOIUrl":"https://doi.org/10.1111/emip.70001","url":null,"abstract":"<div>\u0000 \u0000 <section>\u0000 \u0000 <h3> Module Abstract</h3>\u0000 \u0000 <p>Generalizability theory (GT) is a widely used framework in the social and behavioral sciences for assessing the reliability of measurements. Unlike classical test theory, which treats measurement error as a single undifferentiated term, GT enables the decomposition of error into multiple distinct components. This module introduces the core principles and applications of GT, with a focus on the univariate framework. The first four sections cover foundational concepts, including key terminology, common design structures, and the estimation of variance components. The final two sections offer hands-on examples using real data, implemented in R and GENOVA software. By the end of the module, participants will have a solid understanding of GT and the ability to conduct basic GT analyses using statistical software.</p>\u0000 </section>\u0000 </div>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"38-39"},"PeriodicalIF":1.9,"publicationDate":"2025-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.70001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cover: Illustrating Collusion Networks with Graph Theory","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.70000","DOIUrl":"https://doi.org/10.1111/emip.70000","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":""},"PeriodicalIF":1.9,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145013243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Case for Reimagining Universal Design of Assessment Systems","authors":"Cara Cahalan Laitusis, Meagan Karvonen","doi":"10.1111/emip.12674","DOIUrl":"https://doi.org/10.1111/emip.12674","url":null,"abstract":"<p>The 2014 <i>Standards for Educational and Psychological Testing</i> describe universal design as an approach that offers promise for improving the fairness of educational assessments. As the field reconsiders questions of fairness in assessments, we propose a new framework that addresses the entire assessment lifecycle: universal design of assessment systems. This framework is rooted in the original Universal Design principles but extends beyond test design and administration to the entire assessment lifecycle, from construct definition to score interpretation and use. Another core tenet within this framework is the integration of psychological theory on universal human needs for autonomy, competence, and relatedness with flexibility based on our contemporary understandings of neurodiversity, culture, and multilingualism. Finally, the framework integrates the original <i>Universal Design</i> principle of <i>tolerance for error</i>, which promotes assessment designs that anticipate unintended actions and mitigate potential harms. After describing how the principles and guidelines might apply in contexts ranging from classroom assessments to statewide assessments and licensure exams, we conclude with practical implications and next steps. We hope future versions of the <i>Standards for Educational and Psychological Testing</i> incorporate this broader, systems-wide approach to universal design.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 3","pages":"5-14"},"PeriodicalIF":1.9,"publicationDate":"2025-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12674","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145013161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cover: Sequential Progression and Item Review in Timed Tests: Patterns in Process Data","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12670","DOIUrl":"https://doi.org/10.1111/emip.12670","url":null,"abstract":"<p>We are excited to announce the winners of the 12th <i>EM:IP</i> Cover Graphic/Data Visualization Competition. Each year, we invite our readers to submit visualizations that are not only accurate and insightful but also visually compelling and easy to understand. This year's submissions explored key topics in educational measurement, including process data, item characteristics, test design, and score interpretation. We extend our sincere thanks to everyone who submitted their work, and we are especially grateful to the <i>EM:IP</i> editorial board for their thoughtful review and feedback in the selection process.</p><p>Winning entries may be featured on the cover of a future <i>EM:IP</i> issue. Previous winners who have not yet appeared on a cover remain eligible for upcoming issues.</p><p>This issue's cover features Sequential Progression and Item Review in Timed Tests: Patterns in Process Data, a compelling visualization created by Christian Meyer from the Association of American Medical Colleges and the University of Maryland, along with Ying Jin and Marc Kroopnick, both from the Association of American Medical Colleges.</p><p>The visualization, developed using R, presents smoothed density plots derived from process data collected during a high-stakes admissions test. It illustrates how examinees navigated one section of the test within a 95-minute time limit. The <i>x</i>-axis represents elapsed time in minutes. The <i>y</i>-axis segments item positions into five groups: 1 to 15, 16 to 25, 26 to 35, 36 to 45, and 46 to 59. Meyer and his colleagues explain that, for each item group, the height of the plot indicates density. The supports of the estimated densities extend beyond the start and end of the test to allow the plots to approach zero smoothly at the extremes.</p><p>Color is used effectively to distinguish between initial engagement and item review. Blue areas indicate when items were first viewed, while red areas show when examinees revisited those same items. The authors describe, “The figure illustrates a common test-taking strategy: examinees initially progress sequentially through the test, as shown by the early blue density peaks for each group. Toward the end of the session, they frequently revisit earlier items, as evidenced by the red peaks clustering near the time limit.” This pattern reflects deliberate time management, with examinees dividing their approach into two distinct phases.</p><p>They continue, “In the first phase, they assess each item, either attempting a response or skipping it for later review. In the second phase, they revisit skipped or uncertain items, providing more considered answers when time permits or resorting to random guessing if necessary.”</p><p>According to Meyer and his colleagues, the visualization offers valuable insight into examinees’ time management and engagement strategies during timed tests. They conclude, “It captures temporal strategies, such as sequential progression and end-of-sessi","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 2","pages":""},"PeriodicalIF":2.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12670","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital Module 38: Differential Item Functioning by Multiple Variables Using Moderated Nonlinear Factor Analysis","authors":"Sanford R. Student, Ethan M. McCormick","doi":"10.1111/emip.12669","DOIUrl":"https://doi.org/10.1111/emip.12669","url":null,"abstract":"<div>\u0000 \u0000 <section>\u0000 \u0000 <h3> Module Abstract</h3>\u0000 \u0000 <p>When investigating potential bias in educational test items via differential item functioning (DIF) analysis, researchers have historically been limited to comparing two groups of students at a time. The recent introduction of Moderated Nonlinear Factor Analysis (MNLFA) generalizes Item Response Theory models to extend the assessment of DIF to an arbitrary number of background variables. This facilitates more complex analyses such as DIF across more than two groups (e.g. low/middle/high socioeconomic status), across more than one background variable (e.g. DIF by race/ethnicity and gender), across non-categorical background variables (e.g. DIF by parental income), and more. Framing MNLFA as a generalization of the two-parameter logistic IRT model, we introduce the model with an emphasis on the parameters representing DIF versus impact; describe the current state of the art for estimating MNLFA models; and illustrate the application of MNLFA in a scenario where one wants to test for DIF across two background variables at once.</p>\u0000 </section>\u0000 </div>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 2","pages":"39-41"},"PeriodicalIF":2.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2024 NCME Presidential Address: Challenging Traditional Views of Measurement","authors":"Michael E. Walker","doi":"10.1111/emip.12673","DOIUrl":"https://doi.org/10.1111/emip.12673","url":null,"abstract":"<p>This article is adapted from the 2024 NCME Presidential Address. It reflects a personal journey to challenge traditional views of measurement. Considering alternative viewpoints with an open mind led to several solutions to perplexing problems at the time. The article discusses the culture-boundedness of measurement and the need to take that into consideration when designing tests.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 2","pages":"32-38"},"PeriodicalIF":2.7,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}