{"title":"An Automated Item Pool Assembly Framework for Maximizing Item Utilization for CAT","authors":"Hwanggyu Lim, Kyung (Chris) T. Han","doi":"10.1111/emip.12589","DOIUrl":"10.1111/emip.12589","url":null,"abstract":"<p>Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of psychometric characteristics and content specifications. This study introduces the Honeycomb Pool Assembly (HPA) framework, an innovative solution for the construction of multiple parallel item pools for CAT that maximizes item utilization in the item bank. The HPA framework comprises two stages—cell assembly and pool assembly—and uses a mixed integer programming modeling approach. An empirical study demonstrated HPA's effectiveness in creating a large number of parallel pools using a real-world high-stakes CAT assessment item bank. The HPA framework offers several advantages, including (a) simultaneous creation of multiple parallel pools, (b) simplification of item pool maintenance, and (c) flexibility in establishing statistical and operational constraints. Moreover, it can help testing organizations efficiently manage and monitor the health of their item banks. Thus, the HPA framework is expected to be a valuable tool for testing professionals and organizations to address test security challenges and maintain the integrity of high-stakes CAT assessments.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 1","pages":"39-51"},"PeriodicalIF":2.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139894957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Murphy, Sarah Quesen, Matthew Brunetti, Quintin Love
{"title":"Expected Classification Accuracy for Categorical Growth Models","authors":"Daniel Murphy, Sarah Quesen, Matthew Brunetti, Quintin Love","doi":"10.1111/emip.12599","DOIUrl":"10.1111/emip.12599","url":null,"abstract":"<p>Categorical growth models describe examinee growth in terms of performance-level category transitions, which implies that some percentage of examinees will be misclassified. This paper introduces a new procedure for estimating the classification accuracy of categorical growth models, based on Rudner's classification accuracy index for item response theory–based assessments. Results of a simulation study are presented to provide evidence for the accuracy and validity of the approach. Also, an empirical example is presented to demonstrate the approach using data from the Indiana Student Performance Readiness and Observation of Understanding Tool growth model, which classifies examinees into growth categories used by the Office of Special Education Programs to monitor the progress of preschool children who receive special education services.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 2","pages":"64-73"},"PeriodicalIF":2.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140487311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MxML (Exploring the Relationship between Measurement and Machine Learning): Current State of the Field","authors":"Yi Zheng, Steven Nydick, Sijia Huang, Susu Zhang","doi":"10.1111/emip.12593","DOIUrl":"https://doi.org/10.1111/emip.12593","url":null,"abstract":"<p>The recent surge of machine learning (ML) has impacted many disciplines, including educational and psychological measurement (hereafter shortened as <i>measurement</i>). The measurement literature has seen rapid growth in applications of ML to solve measurement problems. However, as we emphasize in this article, it is imperative to critically examine the potential risks associated with involving ML in measurement. The MxML project aims to explore the relationship between measurement and ML, so as to identify and address the risks and better harness the power of ML to serve measurement missions. This paper describes the first study of the MxML project, in which we summarize the state of the field of applications, extensions, and discussions about ML in measurement contexts with a systematic review of the recent 10 years’ literature. We provide a snapshot of the literature in (1) areas of measurement where ML is discussed, (2) types of articles (e.g., applications, conceptual, etc.), (3) ML methods discussed, and (4) potential risks associated with involving ML in measurement, which result from the differences between what measurement tasks need versus what ML techniques can provide.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 1","pages":"19-38"},"PeriodicalIF":2.0,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139987475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge Integration in Science Learning: Tracking Students' Knowledge Development and Skill Acquisition with Cognitive Diagnosis Models","authors":"Xin Xu, Shixiu Ren, Danhui Zhang, Tao Xin","doi":"10.1111/emip.12592","DOIUrl":"10.1111/emip.12592","url":null,"abstract":"<p>In scientific literacy, knowledge integration (KI) is a scaffolding-based theory to assist students' scientific inquiry learning. To drive students to be self-directed, many courses have been developed based on KI framework. However, few efforts have been made to evaluate the outcome of students' learning under KI instruction. Moreover, finer-grained information has been pursued to better understand students' learning and how it progresses over time. In this article, a normative procedure of building and choosing cognitive diagnosis models (CDMs) and attribute hierarchies was formulated under KI theory. We examined the utility of CDMs for evaluating students' knowledge status in KI learning. The results of the data analysis confirmed an intuitive assumption of the hierarchical structure of KI components. Furthermore, analysis of pre- and posttests using a higher-order, hidden Markov model tracked students' skill acquisition while integrating knowledge. Results showed that students make significant progress after using the web-based inquiry science environment (WISE) platform.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 1","pages":"66-82"},"PeriodicalIF":2.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring Variability in Proctor Decision Making on High-Stakes Assessments: Improving Test Security in the Digital Age","authors":"William Belzak, J. R. Lockwood, Yigal Attali","doi":"10.1111/emip.12591","DOIUrl":"10.1111/emip.12591","url":null,"abstract":"<p>Remote proctoring, or monitoring test takers through internet-based, video-recording software, has become critical for maintaining test security on high-stakes assessments. The main role of remote proctors is to make judgments about test takers' behaviors and decide whether these behaviors constitute rule violations. Variability in proctor decision making, or the degree to which humans/proctors make different decisions about the same test-taking behaviors, can be problematic for both test takers and test users (e.g., universities). In this paper, we measure variability in proctor decision making over time on a high-stakes English language proficiency test. Our results show that (1) proctors systematically differ in their decision making and (2) these differences are trait-like (i.e., ranging from lenient to strict), but (3) systematic variability in decisions can be reduced. Based on these findings, we recommend that test security providers conduct regular measurements of proctors’ judgments and take actions to reduce variability in proctor decision making.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 1","pages":"52-65"},"PeriodicalIF":2.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12591","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using OpenAI GPT to Generate Reading Comprehension Items","authors":"Ayfer Sayin, Mark Gierl","doi":"10.1111/emip.12590","DOIUrl":"10.1111/emip.12590","url":null,"abstract":"<p>The purpose of this study is to introduce and evaluate a method for generating reading comprehension items using template-based automatic item generation. To begin, we describe a new model for generating reading comprehension items called the text analysis cognitive model assessing inferential skills across different reading passages. Next, the text analysis cognitive model is used to generate reading comprehension items where examinees are required to read a passage and identify the irrelevant sentence. The sentences for the generated passages were created using OpenAI GPT-3.5. Finally, the quality of the generated items was evaluated. The generated items were reviewed by three subject-matter experts. The generated items were also administered to a sample of 1,607 Grade-8 students. The correct options for the generated items produced a similar level of difficulty and yielded strong discrimination power while the incorrect options served as effective distractors. Implications of augmented intelligence for item development are discussed.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 1","pages":"5-18"},"PeriodicalIF":2.0,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12590","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139587888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Achievement and Growth on English Language Proficiency and Content Assessments for English Learners in Elementary Grades","authors":"Heather M Buzick, Mikyung Kim Wolf, Laura Ballard","doi":"10.1111/emip.12588","DOIUrl":"10.1111/emip.12588","url":null,"abstract":"<p>English language proficiency (ELP) assessment scores are used by states to make high-stakes decisions related to linguistic support in instruction and assessment for English learner (EL) students and for EL student reclassification. Changes to both academic content standards and ELP academic standards within the last decade have resulted in increased academic rigor and language demands. In this study, we explored the association between EL student performance over time on content (English language arts and mathematics) and ELP assessments, generally finding evidence of positive associations. Modeling the simultaneous association between changes over time in both content and ELP assessment performance contributes empirical evidence about the role of language in ELA and mathematics development and provides contextual information to serve as validity evidence for score inferences for EL students.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 1","pages":"83-95"},"PeriodicalIF":2.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ITEMS Corner Update: The Final Three Steps in the Development Process","authors":"Brian C. Leventhal","doi":"10.1111/emip.12586","DOIUrl":"https://doi.org/10.1111/emip.12586","url":null,"abstract":"<p>Throughout 2023, I have detailed each step of the module development process for the <i>Instructional Topics in Educational Measurement Series</i> (<i>ITEMS</i>). In the first issue, I outlined the 10 steps necessary to complete a module. In the second issue, I detailed Steps 1–3, which cover outlining the content, developing the content in premade PowerPoint templates, and having the slides reviewed by the editor. In the third issue of the year, I outlined Step 4—recording the audio, Step 5—having the editor polish the module (e.g., animating the content), Step 6—building the activity, and Step 7—building interactive learning checks (i.e., selected response questions designed to check for understanding). In this issue, I elaborate on the final three steps: Step 8—external review, Step 9—building the module on the portal, and Step 10—writing the piece to be published in <i>Educational Measurement: Issues and Practice</i> (<i>EM:IP</i>). Following the in-depth explanation of each of these steps, I then introduce the newest module published to the <i>ITEMS</i> portal (https://www.ncme.org/ITEMSportal).</p><p>Authors may opt to have their module externally reviewed (Step 8) prior to recording audio (Step 4) or after the module has been polished (Step 5). Having the module content reviewed prior to recording audio allows for modifying content easily without having to do “double” work (e.g., rerecording audio on slides, reorganizing flow charts). However, many authors find that their bulleted notes for each slide are not sufficient for reviewers to understand the final product. Alternatively, they may opt to have their module sent out for review once it has been editorially polished. This lets reviewers watch the author's “final” product. Because the reviewers may suggest updates, I request authors record audio on each slide. Should an author choose to make a change after review, they then do not have to rerecord an entire 20-minute section of audio. Reviewers are instructed to provide constructive feedback and are given insights about the full process that authors have already worked through (i.e., the ten-step process). It is emphasized that the purpose of <i>ITEMS</i> is not to present novel cutting-edge research. Rather, it is a publication designed to provide instructional resources on current practices in the field.</p><p>After receiving reviewer feedback, authors are provided an opportunity to revise their module. Similar to a manuscript revise and resubmission, authors are asked to respond to each reviewer's comment, articulating how they have addressed each. This serves an additional purpose; specifically, this assists the editor in repolishing the updated module. For example, if audio is rerecorded on a slide, the editor will need to adjust animations and timing. After the editor has made final updates, the author reviews the module to give final approval. Upon receiving approval, the editor then builds the module onto the NCME website <i","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 4","pages":"81"},"PeriodicalIF":2.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12586","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138485181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Cover: Tell-Tale Triangles of Subscore Value","authors":"Yuan-Ling Liaw","doi":"10.1111/emip.12587","DOIUrl":"https://doi.org/10.1111/emip.12587","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 4","pages":"4"},"PeriodicalIF":2.0,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138485202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}