{"title":"A Small Sample Correction for Factor Score Regression.","authors":"Jasper Bogaert, Wen Wei Loh, Yves Rosseel","doi":"10.1177/00131644221105505","DOIUrl":"10.1177/00131644221105505","url":null,"abstract":"<p><p>Factor score regression (FSR) is widely used as a convenient alternative to traditional structural equation modeling (SEM) for assessing structural relations between latent variables. But when latent variables are simply replaced by factor scores, biases in the structural parameter estimates often have to be corrected, due to the measurement error in the factor scores. The method of Croon (MOC) is a well-known bias correction technique. However, its standard implementation can render poor quality estimates in small samples (e.g. less than 100). This article aims to develop a small sample correction (SSC) that integrates two different modifications to the standard MOC. We conducted a simulation study to compare the empirical performance of (a) standard SEM, (b) the standard MOC, (c) naive FSR, and (d) the MOC with the proposed SSC. In addition, we assessed the robustness of the performance of the SSC in various models with a different number of predictors and indicators. The results showed that the MOC with the proposed SSC yielded smaller mean squared errors than SEM and the standard MOC in small samples and performed similarly to naive FSR. However, naive FSR yielded more biased estimates than the proposed MOC with SSC, by failing to account for measurement error in the factor scores.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 3","pages":"495-519"},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10349847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is the Area Under Curve Appropriate for Evaluating the Fit of Psychometric Models?","authors":"Yuting Han, Jihong Zhang, Zhehan Jiang, Dexin Shi","doi":"10.1177/00131644221098182","DOIUrl":"10.1177/00131644221098182","url":null,"abstract":"<p><p>In the literature of modern psychometric modeling, mostly related to item response theory (IRT), the fit of model is evaluated through known indices, such as χ<sup>2</sup>, M2, and root mean square error of approximation (RMSEA) for absolute assessments as well as Akaike information criterion (AIC), consistent AIC (CAIC), and Bayesian information criterion (BIC) for relative comparisons. Recent developments show a merging trend of psychometric and machine learnings, yet there remains a gap in the model fit evaluation, specifically the use of the area under curve (AUC). This study focuses on the behaviors of AUC in fitting IRT models. Rounds of simulations were conducted to investigate AUC's appropriateness (e.g., power and Type I error rate) under various conditions. The results show that AUC possessed certain advantages under certain conditions such as high-dimensional structure with two-parameter logistic (2PL) and some three-parameter logistic (3PL) models, while disadvantages were also obvious when the true model is unidimensional. It cautions researchers about the dangers of using AUC solely in evaluating psychometric models.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 3","pages":"586-608"},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10299668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Polytomous Item Locations in Multicomponent Measuring Instruments: A Note on a Latent Variable Modeling Procedure.","authors":"Tenko Raykov, Martin Pusic","doi":"10.1177/00131644211072829","DOIUrl":"10.1177/00131644211072829","url":null,"abstract":"<p><p>This note is concerned with evaluation of location parameters for polytomous items in multiple-component measuring instruments. A point and interval estimation procedure for these parameters is outlined that is developed within the framework of latent variable modeling. The method permits educational, behavioral, biomedical, and marketing researchers to quantify important aspects of the functioning of items with ordered multiple response options, which follow the popular graded response model. The procedure is routinely and readily applicable in empirical studies using widely circulated software and is illustrated with empirical data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 3","pages":"630-641"},"PeriodicalIF":2.7,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10177315/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9846843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad
{"title":"On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments.","authors":"Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad","doi":"10.1177/00131644221087986","DOIUrl":"10.1177/00131644221087986","url":null,"abstract":"<p><p>Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled <i>on-the-fly</i> considering every possible pair of items), and (b) block selection rules (i.e., <b>T</b>, and Bayesian <b>D</b> and <b>A</b>-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian <b>A</b>-rule with questionnaires assembled <i>on-the-fly</i>, the <b>T</b>-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"294-321"},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972126/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Confidence Intervals of Item Parameters When Some Item Parameters Take Priors in the 2PL and 3PL Models.","authors":"Insu Paek, Zhongtian Lin, Robert Philip Chalmers","doi":"10.1177/00131644221096431","DOIUrl":"10.1177/00131644221096431","url":null,"abstract":"<p><p>To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori (MMAP) and posterior standard error (PSE) are estimated. Confidence intervals (CIs) for these parameters and other parameters which did not take any priors were investigated with popular prior distributions, different error covariance estimation methods, test lengths, and sample sizes. A seemingly paradoxical result was that, when priors were taken, the conditions of the error covariance estimation methods known to be better in the literature (Louis or Oakes method in this study) did not yield the best results for the CI performance, while the conditions of the cross-product method for the error covariance estimation which has the tendency of upward bias in estimating the standard errors exhibited better CI performance. Other important findings for the CI performance are also discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"375-400"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multidimensional Forced-Choice CAT With Dominance Items: An Empirical Comparison With Optimal Static Testing Under Different Desirability Matching.","authors":"Yin Lin, Anna Brown, Paul Williams","doi":"10.1177/00131644221077637","DOIUrl":"10.1177/00131644221077637","url":null,"abstract":"<p><p>Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"322-350"},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alicia Franco-Martínez, Jesús M Alvarado, Miguel A Sorrel
{"title":"Range Restriction Affects Factor Analysis: Normality, Estimation, Fit, Loadings, and Reliability.","authors":"Alicia Franco-Martínez, Jesús M Alvarado, Miguel A Sorrel","doi":"10.1177/00131644221081867","DOIUrl":"10.1177/00131644221081867","url":null,"abstract":"<p><p>A sample suffers range restriction (RR) when its variance is reduced comparing with its population variance and, in turn, it fails representing such population. If the RR occurs over the latent factor, not directly over the observed variable, the researcher deals with an indirect RR, common when using convenience samples. This work explores how this problem affects different outputs of the factor analysis: multivariate normality (MVN), estimation process, goodness-of-fit, recovery of factor loadings, and reliability. In doing so, a Monte Carlo study was conducted. Data were generated following the linear selective sampling model, simulating tests varying their sample size ( <math><mrow><mi>N</mi></mrow> </math> = 200 and 500 cases), test size ( <math><mrow><mi>J</mi></mrow> </math> = 6, 12, 18, and 24 items), loading size ( <math><mrow><mi>L</mi></mrow> </math> = .50, .70, and .90), and restriction size (from <math><mrow><mi>R</mi></mrow> </math> = 1, .90, .80, and so on till .10 selection ratio). Our results systematically suggest that an interaction between decreasing the loading size and increasing the restriction size affects the MVN assessment, obstructs the estimation process, and leads to an underestimation of the factor loadings and reliability. However, most of the MVN tests and most of the fit indices employed were nonsensitive to the RR problem. We provide some recommendations to applied researchers.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"262-293"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Quality of Classification in Mixture Model Simulations.","authors":"Yoona Jang, Sehee Hong","doi":"10.1177/00131644221093619","DOIUrl":"10.1177/00131644221093619","url":null,"abstract":"<p><p>The purpose of this study was to evaluate the degree of classification quality in the basic latent class model when covariates are either included or are not included in the model. To accomplish this task, Monte Carlo simulations were conducted in which the results of models with and without a covariate were compared. Based on these simulations, it was determined that models without a covariate better predicted the number of classes. These findings in general supported the use of the popular three-step approach; with its quality of classification determined to be more than 70% under various conditions of covariate effect, sample size, and quality of indicators. In light of these findings, the practical utility of evaluating classification quality is discussed relative to issues that applied researchers need to carefully consider when applying latent class models.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"351-374"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972124/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10833189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supervised Classes, Unsupervised Mixing Proportions: Detection of Bots in a Likert-Type Questionnaire.","authors":"Michael John Ilagan, Carl F Falk","doi":"10.1177/00131644221104220","DOIUrl":"10.1177/00131644221104220","url":null,"abstract":"<p><p>Administering Likert-type questionnaires to online samples risks contamination of the data by malicious computer-generated random responses, also known as bots. Although nonresponsivity indices (NRIs) such as person-total correlations or Mahalanobis distance have shown great promise to detect bots, universal cutoff values are elusive. An initial calibration sample constructed via stratified sampling of bots and humans-real or simulated under a measurement model-has been used to empirically choose cutoffs with a high nominal specificity. However, a high-specificity cutoff is less accurate when the target sample has a high contamination rate. In the present article, we propose the supervised classes, unsupervised mixing proportions (SCUMP) algorithm that chooses a cutoff to maximize accuracy. SCUMP uses a Gaussian mixture model to estimate, unsupervised, the contamination rate in the sample of interest. A simulation study found that, in the absence of model misspecification on the bots, our cutoffs maintained accuracy across varying contamination rates.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"217-239"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford
{"title":"Implementing a Standardized Effect Size in the POLYSIBTEST Procedure.","authors":"James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford","doi":"10.1177/00131644221081011","DOIUrl":"10.1177/00131644221081011","url":null,"abstract":"<p><p>A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese's standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia's recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl's suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"401-427"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}