{"title":"Investigating Confidence Intervals of Item Parameters When Some Item Parameters Take Priors in the 2PL and 3PL Models.","authors":"Insu Paek, Zhongtian Lin, Robert Philip Chalmers","doi":"10.1177/00131644221096431","DOIUrl":"10.1177/00131644221096431","url":null,"abstract":"<p><p>To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori (MMAP) and posterior standard error (PSE) are estimated. Confidence intervals (CIs) for these parameters and other parameters which did not take any priors were investigated with popular prior distributions, different error covariance estimation methods, test lengths, and sample sizes. A seemingly paradoxical result was that, when priors were taken, the conditions of the error covariance estimation methods known to be better in the literature (Louis or Oakes method in this study) did not yield the best results for the CI performance, while the conditions of the cross-product method for the error covariance estimation which has the tendency of upward bias in estimating the standard errors exhibited better CI performance. Other important findings for the CI performance are also discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad
{"title":"On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments.","authors":"Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad","doi":"10.1177/00131644221087986","DOIUrl":"10.1177/00131644221087986","url":null,"abstract":"<p><p>Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled <i>on-the-fly</i> considering every possible pair of items), and (b) block selection rules (i.e., <b>T</b>, and Bayesian <b>D</b> and <b>A</b>-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian <b>A</b>-rule with questionnaires assembled <i>on-the-fly</i>, the <b>T</b>-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972126/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alicia Franco-Martínez, Jesús M Alvarado, Miguel A Sorrel
{"title":"Range Restriction Affects Factor Analysis: Normality, Estimation, Fit, Loadings, and Reliability.","authors":"Alicia Franco-Martínez, Jesús M Alvarado, Miguel A Sorrel","doi":"10.1177/00131644221081867","DOIUrl":"10.1177/00131644221081867","url":null,"abstract":"<p><p>A sample suffers range restriction (RR) when its variance is reduced comparing with its population variance and, in turn, it fails representing such population. If the RR occurs over the latent factor, not directly over the observed variable, the researcher deals with an indirect RR, common when using convenience samples. This work explores how this problem affects different outputs of the factor analysis: multivariate normality (MVN), estimation process, goodness-of-fit, recovery of factor loadings, and reliability. In doing so, a Monte Carlo study was conducted. Data were generated following the linear selective sampling model, simulating tests varying their sample size ( <math><mrow><mi>N</mi></mrow> </math> = 200 and 500 cases), test size ( <math><mrow><mi>J</mi></mrow> </math> = 6, 12, 18, and 24 items), loading size ( <math><mrow><mi>L</mi></mrow> </math> = .50, .70, and .90), and restriction size (from <math><mrow><mi>R</mi></mrow> </math> = 1, .90, .80, and so on till .10 selection ratio). Our results systematically suggest that an interaction between decreasing the loading size and increasing the restriction size affects the MVN assessment, obstructs the estimation process, and leads to an underestimation of the factor loadings and reliability. However, most of the MVN tests and most of the fit indices employed were nonsensitive to the RR problem. We provide some recommendations to applied researchers.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multidimensional Forced-Choice CAT With Dominance Items: An Empirical Comparison With Optimal Static Testing Under Different Desirability Matching.","authors":"Yin Lin, Anna Brown, Paul Williams","doi":"10.1177/00131644221077637","DOIUrl":"10.1177/00131644221077637","url":null,"abstract":"<p><p>Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Quality of Classification in Mixture Model Simulations.","authors":"Yoona Jang, Sehee Hong","doi":"10.1177/00131644221093619","DOIUrl":"10.1177/00131644221093619","url":null,"abstract":"<p><p>The purpose of this study was to evaluate the degree of classification quality in the basic latent class model when covariates are either included or are not included in the model. To accomplish this task, Monte Carlo simulations were conducted in which the results of models with and without a covariate were compared. Based on these simulations, it was determined that models without a covariate better predicted the number of classes. These findings in general supported the use of the popular three-step approach; with its quality of classification determined to be more than 70% under various conditions of covariate effect, sample size, and quality of indicators. In light of these findings, the practical utility of evaluating classification quality is discussed relative to issues that applied researchers need to carefully consider when applying latent class models.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10833189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supervised Classes, Unsupervised Mixing Proportions: Detection of Bots in a Likert-Type Questionnaire.","authors":"Michael John Ilagan, Carl F Falk","doi":"10.1177/00131644221104220","DOIUrl":"10.1177/00131644221104220","url":null,"abstract":"<p><p>Administering Likert-type questionnaires to online samples risks contamination of the data by malicious computer-generated random responses, also known as bots. Although nonresponsivity indices (NRIs) such as person-total correlations or Mahalanobis distance have shown great promise to detect bots, universal cutoff values are elusive. An initial calibration sample constructed via stratified sampling of bots and humans-real or simulated under a measurement model-has been used to empirically choose cutoffs with a high nominal specificity. However, a high-specificity cutoff is less accurate when the target sample has a high contamination rate. In the present article, we propose the supervised classes, unsupervised mixing proportions (SCUMP) algorithm that chooses a cutoff to maximize accuracy. SCUMP uses a Gaussian mixture model to estimate, unsupervised, the contamination rate in the sample of interest. A simulation study found that, in the absence of model misspecification on the bots, our cutoffs maintained accuracy across varying contamination rates.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford
{"title":"Implementing a Standardized Effect Size in the POLYSIBTEST Procedure.","authors":"James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford","doi":"10.1177/00131644221081011","DOIUrl":"10.1177/00131644221081011","url":null,"abstract":"<p><p>A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese's standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia's recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl's suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Summary Intervals for Model-Based Classification Accuracy and Consistency Indices.","authors":"Oscar Gonzalez","doi":"10.1177/00131644221092347","DOIUrl":"https://doi.org/10.1177/00131644221092347","url":null,"abstract":"<p><p>When scores are used to make decisions about respondents, it is of interest to estimate classification accuracy (CA), the probability of making a correct decision, and classification consistency (CC), the probability of making the same decision across two parallel administrations of the measure. Model-based estimates of CA and CC computed from the linear factor model have been recently proposed, but parameter uncertainty of the CA and CC indices has not been investigated. This article demonstrates how to estimate percentile bootstrap confidence intervals and Bayesian credible intervals for CA and CC indices, which have the added benefit of incorporating the sampling variability of the parameters of the linear factor model to summary intervals. Results from a small simulation study suggest that percentile bootstrap confidence intervals have appropriate confidence interval coverage, although displaying a small negative bias. However, Bayesian credible intervals with diffused priors have poor interval coverage, but their coverage improves once empirical, weakly informative priors are used. The procedures are illustrated by estimating CA and CC indices from a measure used to identify individuals low on mindfulness for a hypothetical intervention, and R code is provided to facilitate the implementation of the procedures.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning.","authors":"Mirka Henninger, Rudolf Debelak, Carolin Strobl","doi":"10.1177/00131644221077135","DOIUrl":"10.1177/00131644221077135","url":null,"abstract":"<p><p>To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoling Liu, Pei Cao, Xinzhen Lai, Jianbing Wen, Yanyun Yang
{"title":"Assessing Essential Unidimensionality of Scales and Structural Coefficient Bias.","authors":"Xiaoling Liu, Pei Cao, Xinzhen Lai, Jianbing Wen, Yanyun Yang","doi":"10.1177/00131644221075580","DOIUrl":"10.1177/00131644221075580","url":null,"abstract":"<p><p>Percentage of uncontaminated correlations (PUC), explained common variance (ECV), and omega hierarchical (ω<sub>H</sub>) have been used to assess the degree to which a scale is essentially unidimensional and to predict structural coefficient bias when a unidimensional measurement model is fit to multidimensional data. The usefulness of these indices has been investigated in the context of bifactor models with balanced structures. This study extends the examination by focusing on bifactor models with unbalanced structures. The maximum and minimum PUC values given the total number of items and factors were derived. The usefulness of PUC, ECV, and ω<sub>H</sub> in predicting structural coefficient bias was examined under a variety of structural regression models with bifactor measurement components. Results indicated that the performance of these indices in predicting structural coefficient bias depended on whether the bifactor measurement model had a balanced or unbalanced structure. PUC failed to predict structural coefficient bias when the bifactor model had an unbalanced structure. ECV performed reasonably well, but worse than ω<sub>H</sub>.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}