{"title":"Argument-Based Approach to Validity: Developing a Living Document and Incorporating Preregistration","authors":"Daria Gerasimova","doi":"10.1111/jedm.12385","DOIUrl":"10.1111/jedm.12385","url":null,"abstract":"<p>I propose two practical advances to the argument-based approach to validity: developing a living document and incorporating preregistration. First, I present a potential structure for the living document that includes an up-to-date summary of the validity argument. As the validation process may span across multiple studies, the living document allows future users of the instrument to access the entire validity argument in one place. Second, I describe how preregistration can be incorporated in the argument-based approach. Specifically, I distinguish between two types of preregistration: preregistration of the argument and preregistration of validation studies. Preregistration of the argument is a single preregistration that is specified for the entire validation process. Here, the developer specifies interpretations, uses, and claims before collecting validity evidence. Preregistration of a validation study refers to preregistering a single validation study that aims to evaluate a set of claims. Here, the developer describes study components (e.g., research design, data collection, data analysis, etc.), before collecting data. Both preregistration types have the potential to reduce the risk of bias (e.g., hindsight and confirmation biases), as well as to allow others to evaluate the risk of bias and, hence, calibrate confidence, in the developer's evaluation of the validity argument.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 2","pages":"252-273"},"PeriodicalIF":1.3,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139837198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenchao Ma, Miguel A. Sorrel, Xiaoming Zhai, Yuan Ge
{"title":"A Dual-Purpose Model for Binary Data: Estimating Ability and Misconceptions","authors":"Wenchao Ma, Miguel A. Sorrel, Xiaoming Zhai, Yuan Ge","doi":"10.1111/jedm.12383","DOIUrl":"10.1111/jedm.12383","url":null,"abstract":"<p>Most existing diagnostic models are developed to detect whether students have mastered a set of skills of interest, but few have focused on identifying what scientific misconceptions students possess. This article developed a general dual-purpose model for simultaneously estimating students' overall ability and the presence and absence of misconceptions. The expectation-maximization algorithm was developed to estimate the model parameters. A simulation study was conducted to evaluate to what extent the parameters can be accurately recovered under varied conditions. A set of real data in science education was also analyzed to examine the viability of the proposed model in practice.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 2","pages":"179-197"},"PeriodicalIF":1.3,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139373800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Highly Adaptive Testing Design for PISA","authors":"Andreas Frey, Christoph König, Aron Fink","doi":"10.1111/jedm.12382","DOIUrl":"https://doi.org/10.1111/jedm.12382","url":null,"abstract":"The highly adaptive testing (HAT) design is introduced as an alternative test design for the Programme for International Student Assessment (PISA). The principle of HAT is to be as adaptive as possible when selecting items while accounting for PISA's nonstatistical constraints and addressing issues concerning PISA such as item position effects. HAT combines established methods from the field of computerized adaptive testing. It is implemented in R and code is provided. HAT was compared to the PISA 2018 multistage design (MST) in a simulation study based on a factorial design with the independent variables response probability (RP; .50, .62), item pool optimality (PISA 2018, optimal), and ability level (low, medium, high). PISA-specific conditions regarding sample size, missing responses, and nonstatistical constraints were implemented. HAT clearly outperformed MST regarding test information, RMSE, and constraint management across ability groups but it showed slightly weaker item exposure. Raising RP to .62 did not decrease test information much and is therefore a viable option to foster students’ test-taking experience with HAT. Test information for HAT was up to three times higher than for MST when using a hypothetical optimal item pool. Summarizing, HAT proved to be a promising and applicable test design for PISA.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"13 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138539704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Highly Adaptive Testing Design for PISA","authors":"Andreas Frey, Christoph König, Aron Fink","doi":"10.1111/jedm.12382","DOIUrl":"https://doi.org/10.1111/jedm.12382","url":null,"abstract":"The highly adaptive testing (HAT) design is introduced as an alternative test design for the Programme for International Student Assessment (PISA). The principle of HAT is to be as adaptive as possible when selecting items while accounting for PISA's nonstatistical constraints and addressing issues concerning PISA such as item position effects. HAT combines established methods from the field of computerized adaptive testing. It is implemented in R and code is provided. HAT was compared to the PISA 2018 multistage design (MST) in a simulation study based on a factorial design with the independent variables response probability (RP; .50, .62), item pool optimality (PISA 2018, optimal), and ability level (low, medium, high). PISA-specific conditions regarding sample size, missing responses, and nonstatistical constraints were implemented. HAT clearly outperformed MST regarding test information, RMSE, and constraint management across ability groups but it showed slightly weaker item exposure. Raising RP to .62 did not decrease test information much and is therefore a viable option to foster students’ test-taking experience with HAT. Test information for HAT was up to three times higher than for MST when using a hypothetical optimal item pool. Summarizing, HAT proved to be a promising and applicable test design for PISA.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"13 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138539665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computation and Accuracy Evaluation of Comparable Scores on Culturally Responsive Assessments","authors":"Sandip Sinharay, Matthew S. Johnson","doi":"10.1111/jedm.12381","DOIUrl":"10.1111/jedm.12381","url":null,"abstract":"<p>Culturally responsive assessments have been proposed as potential tools to ensure equity and fairness for examinees from all backgrounds including those from traditionally underserved or minoritized groups. However, these assessments are relatively new and, with few exceptions, are yet to be implemented in large scale. Consequently, there is a lack of guidance on how one can compute comparable scores on various versions of these assessments. In this paper, the multigroup multidimensional Rasch model is repurposed for modeling data originating from various versions of a culturally responsive assessment and for analyzing such data to compute comparable scores. Two simulation studies are performed to evaluate the performance of the model for data simulated from hypothetical culturally responsive assessments and to find the conditions under which the computed scores are accurate. Recommendations are made for measurement practitioners interested in culturally responsive assessments.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"5-46"},"PeriodicalIF":1.3,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138539727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computation and Accuracy Evaluation of Comparable Scores on Culturally Responsive Assessments","authors":"Sandip Sinharay, Matthew S. Johnson","doi":"10.1111/jedm.12381","DOIUrl":"https://doi.org/10.1111/jedm.12381","url":null,"abstract":"Culturally responsive assessments have been proposed as potential tools to ensure equity and fairness for examinees from all backgrounds including those from traditionally underserved or minoritized groups. However, these assessments are relatively new and, with few exceptions, are yet to be implemented in large scale. Consequently, there is a lack of guidance on how data on how one can compute comparable scores on various versions of these assessments. In this paper, the multigroup multidimensional Rasch model is repurposed for modeling data originating from various versions of a culturally responsive assessment and for analyzing such data to compute comparable scores. Two simulation studies are performed to evaluate the performance of the model for data simulated from hypothetical culturally responsive assessments and to find the conditions under which the computed scores are accurate. Recommendations are made for measurement practitioners interested in culturally responsive assessments.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"44 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138539685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Test‐Taking Engagement into Multistage Adaptive Testing Design for Large‐Scale Assessments","authors":"Okan Bulut, Guher Gorgun, Hacer Karamese","doi":"10.1111/jedm.12380","DOIUrl":"https://doi.org/10.1111/jedm.12380","url":null,"abstract":"Abstract The use of multistage adaptive testing (MST) has gradually increased in large‐scale testing programs as MST achieves a balanced compromise between linear test design and item‐level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can do. However, research shows that large‐scale assessments may suffer from a lack of test‐taking engagement, especially if they are low stakes. Examinees with low test‐taking engagement are likely to show noneffortful responding (e.g., answering the items very rapidly without reading the item stem or response options). To alleviate the impact of noneffortful responses on the measurement accuracy of MST, test‐taking engagement can be operationalized as a latent trait based on response times and incorporated into the on‐the‐fly module assembly procedure. To demonstrate the proposed approach, a Monte‐Carlo simulation study was conducted based on item parameters from an international large‐scale assessment. The results indicated that the on‐the‐fly module assembly considering both ability and test‐taking engagement could minimize the impact of noneffortful responses, yielding more accurate ability estimates and classifications. Implications for practice and directions for future research were discussed.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"119 52","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135137584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information Functions of Rank-2PL Models for Forced-Choice Questionnaires","authors":"Jianbin Fu, Xuan Tan, Patrick C. Kyllonen","doi":"10.1111/jedm.12379","DOIUrl":"10.1111/jedm.12379","url":null,"abstract":"<p>This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's information and directional information are described, and the test information for Maximum Likelihood (ML), Maximum A Posterior (MAP), and Expected A Posterior (EAP) trait score estimates is distinguished. Expected item/test information indexes at various levels are proposed and plotted to provide diagnostic information on items and tests. The expected test information indexes for EAP scores may be difficult to compute due to a typical test's vast number of item response patterns. The relationships of item/test information with discrimination parameters of statements, standard error, and reliability estimates of trait score estimates are discussed and demonstrated using real data. Practical suggestions for checking the various expected item/test information indexes and plots are provided.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"125-149"},"PeriodicalIF":1.3,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136134855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches","authors":"Güler Yavuz Temel","doi":"10.1111/jedm.12377","DOIUrl":"10.1111/jedm.12377","url":null,"abstract":"<p>The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in simulation studies and applying real data sets. When the test structure included two dimensions, the IRT-LR (MML-EM) generally performed better than the Wald test and provided higher power rates. If the test included three dimensions, the methods provided similar performance in DIF detection. In contrast to these results, when the number of dimensions in the test was four, MML-EM estimation completely lost precision in estimating the nonuniform DIF, even with large sample sizes. The Wald with MHRM estimation approaches outperformed the Wald test (MML-EM) and IRT-LR (MML-EM). The Wald test had higher power rate and acceptable type I error rates for nonuniform DIF with the MHRM estimation approach.The small and/or unbalanced sample sizes, small DIF magnitudes, unequal ability distributions between groups, number of dimensions, estimation methods and test structure were evaluated as important test factors for detecting multidimensional DIF.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"69-98"},"PeriodicalIF":1.3,"publicationDate":"2023-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136185515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jia Liu, Xiangbin Meng, Gongjun Xu, Wei Gao, Ningzhong Shi
{"title":"MSAEM Estimation for Confirmatory Multidimensional Four-Parameter Normal Ogive Models","authors":"Jia Liu, Xiangbin Meng, Gongjun Xu, Wei Gao, Ningzhong Shi","doi":"10.1111/jedm.12378","DOIUrl":"10.1111/jedm.12378","url":null,"abstract":"<p>In this paper, we develop a mixed stochastic approximation expectation-maximization (MSAEM) algorithm coupled with a Gibbs sampler to compute the marginalized maximum a posteriori estimate (MMAPE) of a confirmatory multidimensional four-parameter normal ogive (M4PNO) model. The proposed MSAEM algorithm not only has the computational advantages of the stochastic approximation expectation-maximization (SAEM) algorithm for multidimensional data, but it also alleviates the potential instability caused by label-switching, and then improved the estimation accuracy. Simulation studies are conducted to illustrate the good performance of the proposed MSAEM method, where MSAEM consistently performs better than SAEM and some other existing methods in multidimensional item response theory. Moreover, the proposed method is applied to a real data set from the 2018 Programme for International Student Assessment (PISA) to demonstrate the usefulness of the 4PNO model as well as MSAEM in practice.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 1","pages":"99-124"},"PeriodicalIF":1.3,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135146227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}