{"title":"On the Cover: The Increasing Impact of EM:IP","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12657","DOIUrl":"https://doi.org/10.1111/emip.12657","url":null,"abstract":"<p>The cover of this issue featured “The Increasing Impact of <i>EM:IP</i>” by Zhongmin Cui, the journal's editor. Cui elaborated on the significance of the impact factor for Educational Measurement: Issues and Practice (<i>EM:IP</i>), one of the most widely recognized metrics for evaluating a journal's influence and prestige. The impact factor, which measures how frequently a journal's articles are cited over a specific period, serves as a critical tool for researchers, institutions, and funding bodies in assessing the relevance and significance of published work.</p><p>Cui noted the challenges in measuring a journal's influence, stating, “As measurement professionals, we are well aware of the difficulties in quantifying almost anything, including the impact of a journal. However, even imperfect metrics, if carefully designed, can provide valuable insights for users making informed decisions.”</p><p>He cited <i>EM:IP</i>’s latest journal impact factor of 2.7 (Wiley, <span>2024</span>), which was calculated based on citations from the previous two years. Acknowledging that this figure might not seem substantial, Cui emphasized that it represents a significant milestone in the journal's history. “The visualization we created illustrates a steady, consistent upward trend in <i>EM:IP</i>’s impact factor over the past decade. This growth reflects our ongoing commitment to publishing high-quality, impactful research that resonates with both scholars and practitioners,” he added.</p><p>Cui also stressed the growing influence of <i>EM:IP</i> in the field of educational and psychological measurement. He credited this achievement to the dedication of the authors, the insights of the reviewers, and the ongoing support of the readers. “Everyone's contributions have been crucial to our success, and we are excited to continue our mission to advance knowledge and foster scholarly discourse in the years ahead,” he expressed with gratitude.</p><p>The visualization was created using Python, following guidelines established by Setzer and Cui (<span>2022</span>). “One special feature of the graph is the use of the journal's color scheme, which enhances visual harmony, particularly for the cover design,” Cui explained. The data used to calculate the impact factor was sourced from Clarivate (https://clarivate.com/). For those interested in learning more about this data visualization, Zhongmin Cui can be contacted at [email protected].</p><p>We also invite you to participate in the annual <i>EM:IP</i> Cover Graphic/Data Visualization Competition. Details for the 2025 competition can be found in this issue. Your entry could be featured on the cover of a future issue! We're eager to receive your feedback and submissions. Please share your thoughts or questions by emailing Yuan-Ling Liaw at [email protected].</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"7"},"PeriodicalIF":2.7,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12657","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Current Psychometric Models and Some Uses of Technology in Educational Testing","authors":"Robert L. Brennan","doi":"10.1111/emip.12644","DOIUrl":"https://doi.org/10.1111/emip.12644","url":null,"abstract":"<p>This paper addresses some issues concerning the use of current psychometric models for current (and possibly future) technology-based educational testing (as well as most licensure and certification testing). The intent here is to provide a relatively simple overview that addresses important issues, with little explicit intent to argue strenuously for or against the particular uses of technology discussed here.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"88-92"},"PeriodicalIF":2.7,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12644","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Instruction-Tuned Large-Language Models for Quality Control in Automatic Item Generation: A Feasibility Study","authors":"Guher Gorgun, Okan Bulut","doi":"10.1111/emip.12663","DOIUrl":"https://doi.org/10.1111/emip.12663","url":null,"abstract":"<p>Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for evaluating automatically generated cloze items. The trained large-language model was able to filter out majority of good and bad items accurately. Evaluating items automatically with instruction-tuned LLMs may aid educators and test developers in understanding the quality of items created in an efficient and scalable manner. The item evaluation process with LLMs may also act as an intermediate step between item creation and field testing to reduce the cost and time associated with multiple rounds of revision.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"96-107"},"PeriodicalIF":2.7,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12663","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143423870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Still Interested in Multidimensional Item Response Theory Modeling? Here Are Some Thoughts on How to Make It Work in Practice","authors":"Terry A. Ackerman, Richard M. Luecht","doi":"10.1111/emip.12645","DOIUrl":"https://doi.org/10.1111/emip.12645","url":null,"abstract":"<p>Given tremendous improvements over the past three to four decades in the computational methods and computer technologies needed to estimate the parameters for higher dimensionality models (Cai, <span>2010a, 2010b</span>, <span>2017</span>), we might expect that MIRT would by now be a widely used array of models and psychometric software tools being used operationally in many educational assessment settings. Perhaps one of the few areas where MIRT has helped practitioners is in the area of understanding Differential Item Functioning (DIF) (Ackerman & Ma, <span>2024</span>; Camilli, <span>1992</span>; Shealy & Stout, <span>1993</span>). Nevertheless, the expectation has not been met nor do there seem to be many operational initiatives to change the <i>status quo</i>.</p><p>Some research psychometricians might lament the lack of large-scale applications of MIRT in the field of educational assessment. However, the simple fact is that MIRT has not lived up to its early expectations nor its potential due to several barriers. Following a discussion of test purpose and metric design issues in the next section, we will examine some of the barriers associated with these topics and provide suggestions for overcoming or completely avoiding them.</p><p>Tests developed for one purpose are rarely of much utility for another purpose. For example, professional certification and licensure tests designed to optimize pass-fail classifications are often not very useful for reporting scores across a large proficiency range—at least not unless the tests are extremely long. Summative, and most interim assessments used in K–12 education, are usually designed to produce reliable total-test scores. The resulting scale scores are summarized as descriptive statistical aggregations of scale scores or other functions of the scores such as classifying students in ordered achievement levels (e.g., Below Basic, Basic, Proficient, Advanced), or in modeling student growth in a subject area as part of an educational accountability system. Some commercially available online “interim” assessments provide limited progress-oriented scores and subscores from on-demand tests. However, the defensible formative utility of most interim assessments remains limited because test development and psychometric analytics follow the summative assessment test design and development paradigm: focusing on maintaining vertically aligned or equated, unidimensional scores scales (e.g., a K–12 math scale).</p><p>The requisite test design and development frameworks for summative tests focus on the relationships between the item responses and the total test score scale (e.g., maximizing item-total score correlations and the conditional reliability within prioritized regions of that score scale).</p><p>Applying MIRT models to most summative or interim assessments makes little sense. The problem is that we continue to allow policymakers to make claims about score interpretations that are not support","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"93-100"},"PeriodicalIF":2.7,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12645","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalizing Assessment: Dream or Nightmare?","authors":"Randy E. Bennett","doi":"10.1111/emip.12652","DOIUrl":"https://doi.org/10.1111/emip.12652","url":null,"abstract":"<p>Over our field's 100-year-plus history, standardization has been a central assumption in test theory and practice. The concept's justification turns on leveling the playing field by presenting all examinees with putatively equivalent experiences. Until relatively recently, our field has accepted that justification almost without question. In this article, I present a case for standardization's antithesis, personalization. Interestingly, personalized assessment has important precedents within the measurement community. As intriguing are some of the divergent ways in which personalization might be realized in practice. Those ways, however, suggest a host of serious issues. Despite those issues, both moral obligation and survival imperative counsel persistence in trying to personalize assessment.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"119-125"},"PeriodicalIF":2.7,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunyan Liu, Monica M. Cuddy, Qiwei He, Wenli Ouyang, Cara Artman
{"title":"Using Process Data to Evaluate the Impact of Shortening Allotted Case Time in a Simulation-Based Assessment","authors":"Chunyan Liu, Monica M. Cuddy, Qiwei He, Wenli Ouyang, Cara Artman","doi":"10.1111/emip.12656","DOIUrl":"https://doi.org/10.1111/emip.12656","url":null,"abstract":"<p>The Computer-based Case Simulations (CCS) component of the United States Medical Licensing Examination (USMLE) Step 3 was developed to assess the decision-making and patient-management skills of physicians. Process data can provide deep insights into examinees’ behavioral processes related to completing the CCS assessment task. In this paper, we utilized process data to evaluate the impact of shortening allotted time limit by rescoring the CCS cases based on process data extracted at various timestamps that represented different percentages of the original allotted case time. It was found that examinees’ performance as well as the correlation between original and newly generated scores both tended to decrease as the timestamp condition became stricter. The impact of shortening allotted time limit was found marginally associated with case difficulties, but strongly dependent on the case time intensity under the original time setting.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"24-32"},"PeriodicalIF":2.7,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"What Makes Measurement Important for Education?","authors":"Mark Wilson","doi":"10.1111/emip.12646","DOIUrl":"https://doi.org/10.1111/emip.12646","url":null,"abstract":"<p>This contribution to the Special Issue of <i>EM:IP</i> on the topic of <i>The Past, Present and Future of Educational Measurement</i> concentrates on the present and the future and hence focuses on the goal of improving education. The results of meta-analyses were examined, and it was noted that the largest effect sizes were associated with actual use of formative assessments in classroom settings—hence <i>classroom assessment</i> (in contrast with <i>large-scale assessment</i>). The paper describes micro assessment, which focuses on in-classroom forms of measurement, and then expands this assessment approach to focus on frames beyond that in terms of summative end-of-semester tests (macro). This is followed by a description of how these approaches can be combined using a construct map as the basis for developing and using assessments to span across these two levels in terms of the BEAR Assessment System (BAS). Throughout, this is exemplified using an elementary school program designed to teach students about geometry. Finally, a conclusion summarizes the discussion, and also looks to the future where a meso level of use involves end-of-unit tests.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 4","pages":"73-82"},"PeriodicalIF":2.7,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12646","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Growth across Grades and Common Item Grade Alignment in Vertical Scaling Using the Rasch Model","authors":"Sanford R. Student, Derek C. Briggs, Laurie Davis","doi":"10.1111/emip.12639","DOIUrl":"https://doi.org/10.1111/emip.12639","url":null,"abstract":"<p>Vertical scales are frequently developed using common item nonequivalent group linking. In this design, one can use upper-grade, lower-grade, or mixed-grade common items to estimate the linking constants that underlie the absolute measurement of growth. Using the Rasch model and a dataset from Curriculum Associates’ i-Ready Diagnostic in math in grades 3–7, we demonstrate how grade-to-grade mean differences in mathematics proficiency appear much larger when upper-grade linking items are used instead of lower-grade items, with linkings based on a mixture of items falling in between. We then consider salient properties of the three calibrated scales including invariance of the different sets of common items to student grade and item difficulty reversals. These exploratory analyses suggest that upper-grade common items in vertical scaling are more subject to threats to score comparability across grades, even though these items also tend to imply the most growth.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"44 1","pages":"84-95"},"PeriodicalIF":2.7,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}