{"title":"A New Method to Balance Measurement Accuracy and Attribute Coverage in Cognitive Diagnostic Computerized Adaptive Testing.","authors":"Xiaojian Sun, Björn Andersson, Tao Xin","doi":"10.1177/01466216211040489","DOIUrl":"https://doi.org/10.1177/01466216211040489","url":null,"abstract":"<p><p>As one of the important research areas of cognitive diagnosis assessment, cognitive diagnostic computerized adaptive testing (CD-CAT) has received much attention in recent years. Measurement accuracy is the major theme in CD-CAT, and both the item selection method and the attribute coverage have a crucial effect on measurement accuracy. A new attribute coverage index, the ratio of test length to the number of attributes (RTA), is introduced in the current study. RTA is appropriate when the item pool comprises many items that measure multiple attributes where it can both produce acceptable measurement accuracy and balance the attribute coverage. With simulations, the new index is compared to the original item selection method (ORI) and the attribute balance index (ABI), which have been proposed in previous studies. The results show that (1) the RTA method produces comparable measurement accuracy to the ORI method under most item selection methods; (2) the RTA method produces higher measurement accuracy than the ABI method for most item selection methods, with the exception of the mutual information item selection method; (3) the RTA method prefers items that measure multiple attributes, compared to the ORI and ABI methods, while the ABI prefers items that measure a single attribute; and (4) the RTA method performs better than the ORI method with respect to attribute coverage, while it performs worse than the ABI with long tests.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 7-8","pages":"463-476"},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640349/pdf/10.1177_01466216211040489.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39692926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global Information for Multidimensional Tests.","authors":"Katherine G Jonas","doi":"10.1177/01466216211042803","DOIUrl":"https://doi.org/10.1177/01466216211042803","url":null,"abstract":"<p><p>New measures of test information, termed global information, quantify test information relative to the entire range of the trait being assessed. Estimating global information relative to a non-informative prior distribution results in a measure of how much information could be gained by administering the test to an unspecified examinee. Currently, such measures have been developed only for unidimensional tests. This study introduces measures of multidimensional global test information and validates them in simulated data. Then, the utility of global test information is tested in neuropsychological data collected as part of Rush University's Memory and Aging Project. These measures allow for direct comparison of complex tests calibrated in different samples, facilitating test development and selection.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 7-8","pages":"494-517"},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640353/pdf/10.1177_01466216211042803.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39693867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naidan Tu, Bo Zhang, Lawrence Angrave, Tianjun Sun
{"title":"<i>bmggum</i>: An R Package for Bayesian Estimation of the Multidimensional Generalized Graded Unfolding Model With Covariates.","authors":"Naidan Tu, Bo Zhang, Lawrence Angrave, Tianjun Sun","doi":"10.1177/01466216211040488","DOIUrl":"https://doi.org/10.1177/01466216211040488","url":null,"abstract":"<p><p>Over the past couple of decades, there has been an increasing interest in adopting ideal point models to represent noncognitive constructs, as they have been demonstrated to better measure typical behaviors than traditional dominance models do. The generalized graded unfolding model (<i>GGUM</i>) has consistently been the most popular ideal point model among researchers and practitioners. However, the GGUM2004 software and the later developed <i>GGUM</i> package in R can only handle unidimensional models despite the fact that many noncognitive constructs are multidimensional in nature. In addition, GGUM2004 and the <i>GGUM</i> package often yield unreasonable estimates of item parameters and standard errors. To address these issues, we developed the new open-source <i>bmggum</i> R package that is capable of estimating both unidimensional and multidimensional <i>GGUM</i> using a fully Bayesian approach, with supporting capabilities of stabilizing parameterization, incorporating person covariates, estimating constrained models, providing fit diagnostics, producing convergence metrics, and effectively handling missing data.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 7-8","pages":"553-555"},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640348/pdf/10.1177_01466216211040488.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39694261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial Measurement Invariance: Extending and Evaluating the Cluster Approach for Identifying Anchor Items.","authors":"Steffi Pohl, Daniel Schulze, Eric Stets","doi":"10.1177/01466216211042809","DOIUrl":"https://doi.org/10.1177/01466216211042809","url":null,"abstract":"<p><p>When measurement invariance does not hold, researchers aim for partial measurement invariance by identifying anchor items that are assumed to be measurement invariant. In this paper, we build on Bechger and Maris's approach for identification of anchor items. Instead of identifying differential item functioning (DIF)-free items, they propose to identify different sets of items that are invariant in item parameters within the same item set. We extend their approach by an additional step in order to allow for identification of homogeneously functioning item sets. We evaluate the performance of the extended cluster approach under various conditions and compare its performance to that of previous approaches, that are the equal-mean difficulty (EMD) approach and the iterative forward approach. We show that the EMD and the iterative forward approaches perform well in conditions with balanced DIF or when DIF is small. In conditions with large and unbalanced DIF, they fail to recover the true group mean differences. With appropriate threshold settings, the cluster approach identified a cluster that resulted in unbiased mean difference estimates in all conditions. Compared to previous approaches, the cluster approach allows for a variety of different assumptions as well as for depicting the uncertainty in the results that stem from the choice of the assumption. Using a real data set, we illustrate how the assumptions of the previous approaches may be incorporated in the cluster approach and how the chosen assumption impacts the results.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 7-8","pages":"477-493"},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/c9/d1/10.1177_01466216211042809.PMC8640350.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39693866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IRTGUI: An R Package for Unidimensional Item Response Theory Analysis With a Graphical User Interface.","authors":"Huseyin Yildiz","doi":"10.1177/01466216211040532","DOIUrl":"https://doi.org/10.1177/01466216211040532","url":null,"abstract":"<p><p>In the last decade, many R packages were published to perform item response theory (IRT) analysis. Some researchers and practitioners have difficulty in using these functional tools because of their insufficient coding skills. The <i>IRTGUI</i> package provides these researchers a user-friendly GUI where they can perform unidimensional IRT analysis without coding skills. Using the <i>IRTGUI</i> package, person and item parameters, model and item fit indices can be obtained. Dimensionality and local independence assumptions can be tested. With the <i>IRTGUI</i> package, users can generate dichotomous data sets with customizable conditions. Also, Wright Maps, item characteristics and information curves can be graphically displayed. All outputs can be easily downloaded by users.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 7-8","pages":"551-552"},"PeriodicalIF":1.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8640354/pdf/10.1177_01466216211040532.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39693870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the Impact of Noneffortful Responses on Individual-Level Scores: Can the Effort-Moderated IRT Model Serve as a Solution?","authors":"Joseph A Rios, James Soland","doi":"10.1177/01466216211013896","DOIUrl":"10.1177/01466216211013896","url":null,"abstract":"<p><p>Suboptimal effort is a major threat to valid score-based inferences. While the effects of such behavior have been frequently examined in the context of mean group comparisons, minimal research has considered its effects on individual score use (e.g., identifying students for remediation). Focusing on the latter context, this study addressed two related questions via simulation and applied analyses. First, we investigated how much including noneffortful responses in scoring using a three-parameter logistic (3PL) model affects person parameter recovery and classification accuracy for noneffortful responders. Second, we explored whether improvements in these individual-level inferences were observed when employing the Effort Moderated IRT (EM-IRT) model under conditions in which its assumptions were met and violated. Results demonstrated that including 10% noneffortful responses in scoring led to average bias in ability estimates and misclassification rates by as much as 0.15 <i>SD</i>s and 7%, respectively. These results were mitigated when employing the EM-IRT model, particularly when model assumptions were met. However, once model assumptions were violated, the EM-IRT model's performance deteriorated, though still outperforming the 3PL model. Thus, findings from this study show that (a) including noneffortful responses when using individual scores can lead to potential unfounded inferences and potential score misuse, and (b) the negative impact that noneffortful responding has on person ability estimates and classification accuracy can be mitigated by employing the EM-IRT model, particularly when its assumptions are met.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 6","pages":"391-406"},"PeriodicalIF":1.2,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8381694/pdf/10.1177_01466216211013896.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39451274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Signal Detection Model for Multiple-Choice Exams.","authors":"Lawrence T DeCarlo","doi":"10.1177/01466216211014599","DOIUrl":"https://doi.org/10.1177/01466216211014599","url":null,"abstract":"<p><p>A model for multiple-choice exams is developed from a signal-detection perspective. A correct alternative in a multiple-choice exam can be viewed as being a signal embedded in noise (incorrect alternatives). Examinees are assumed to have perceptions of the plausibility of each alternative, and the decision process is to choose the most plausible alternative. It is also assumed that each examinee either knows or does not know each item. These assumptions together lead to a <i>signal detection choice model</i> for multiple-choice exams. The model can be viewed, statistically, as a mixture extension, with random mixing, of the traditional choice model, or similarly, as a grade-of-membership extension. A version of the model with extreme value distributions is developed, in which case the model simplifies to a mixture multinomial logit model with random mixing. The approach is shown to offer measures of item discrimination and difficulty, along with information about the relative plausibility of each of the alternatives. The model, parameters, and measures derived from the parameters are compared to those obtained with several commonly used item response theory models. An application of the model to an educational data set is presented.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 6","pages":"423-440"},"PeriodicalIF":1.2,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/01466216211014599","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39451276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Guessing: An Alternative Adjusted Positive Learning Estimator and Comparing Probability Misspecification With Monte Carlo Simulations.","authors":"Ben O Smith, Dustin R White","doi":"10.1177/01466216211013905","DOIUrl":"https://doi.org/10.1177/01466216211013905","url":null,"abstract":"<p><p>Practitioners in the sciences have used the \"flow\" of knowledge (post-test score minus pre-test score) to measure learning in the classroom for the past 50 years. Walstad and Wagner, and Smith and Wagner moved this practice forward by disaggregating the flow of knowledge and accounting for student guessing. These estimates are sensitive to misspecification of the probability of guessing correct. This work provides guidance to practitioners and researchers facing this problem. We introduce a transformed measure of true positive learning that under some knowable conditions performs better when students' ability to guess correctly is misspecified and converges to Hake's normalized learning gain estimator under certain conditions. We then use simulations to compare the accuracy of two estimation techniques under various violations of the assumptions of those techniques. Using recursive partitioning trees fitted to our simulation results, we provide the practitioner concrete guidance based on a set of yes/no questions.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 6","pages":"441-458"},"PeriodicalIF":1.2,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/01466216211013905","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39451277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Becker, Dries Debeer, Sebastian Weirich, Frank Goldhammer
{"title":"On the Speed Sensitivity Parameter in the Lognormal Model for Response Times and Implications for High-Stakes Measurement Practice.","authors":"Benjamin Becker, Dries Debeer, Sebastian Weirich, Frank Goldhammer","doi":"10.1177/01466216211008530","DOIUrl":"10.1177/01466216211008530","url":null,"abstract":"<p><p>In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comparability of test forms regarding speededness and ability estimation was investigated. The lognormal measurement model for response times by van der Linden was compared with its extension by Klein Entink, van der Linden, and Fox, which includes a speed sensitivity parameter. An empirical data example was used to show that the extended model can fit the data better than the model without speed sensitivity parameters. A simulation was conducted, which showed that test forms with different average speed sensitivity yielded substantial different ability estimates for slow test takers, especially for test takers with high ability. Therefore, the use of the extended lognormal model for response times is recommended for the calibration of item pools in high-stakes testing situations. Limitations to the proposed approach and further research questions are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 6","pages":"407-422"},"PeriodicalIF":1.2,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8381695/pdf/10.1177_01466216211008530.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39451275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymptotic Standard Errors of Generalized Partial Credit Model True Score Equating Using Characteristic Curve Methods.","authors":"Zhonghua Zhang","doi":"10.1177/01466216211013101","DOIUrl":"https://doi.org/10.1177/01466216211013101","url":null,"abstract":"<p><p>In this study, the delta method was applied to estimate the standard errors of the true score equating when using the characteristic curve methods with the generalized partial credit model in test equating under the context of the common-item nonequivalent groups equating design. Simulation studies were further conducted to compare the performance of the delta method with that of the bootstrap method and the multiple imputation method. The results indicated that the standard errors produced by the delta method were very close to the criterion empirical standard errors as well as those yielded by the bootstrap method and the multiple imputation method under all the manipulated conditions.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"45 5","pages":"331-345"},"PeriodicalIF":1.2,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/01466216211013101","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39452434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}