{"title":"Comparing RMSEA-Based Indices for Assessing Measurement Invariance in Confirmatory Factor Models","authors":"Nataly Beribisky, Gregory R. Hancock","doi":"10.1177/00131644231202949","DOIUrl":"https://doi.org/10.1177/00131644231202949","url":null,"abstract":"Fit indices are descriptive measures that can help evaluate how well a confirmatory factor analysis (CFA) model fits a researcher’s data. In multigroup models, before between-group comparisons are made, fit indices may be used to evaluate measurement invariance by assessing the degree to which multiple groups’ data are consistent with increasingly constrained nested models. One such fit index is an adaptation of the root mean square error of approximation (RMSEA) called RMSEA D . This index embeds the chi-square and degree-of-freedom differences into a modified RMSEA formula. The present study comprehensively compared RMSEA D to ΔRMSEA, the difference between two RMSEA values associated with a comparison of nested models. The comparison consisted of both derivations as well as a population analysis using one-factor CFA models with features common to those found in practical research. The findings demonstrated that for the same model, RMSEA D will always have increased sensitivity relative to ΔRMSEA with an increasing number of indicator variables. The study also indicated that RMSEA D had increased ability to detect noninvariance relative to ΔRMSEA in one-factor models. For these reasons, when evaluating measurement invariance, RMSEA D is recommended instead of ΔRMSEA.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135326183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Item Parameter Recovery: Sensitivity to Prior Distribution","authors":"Christine E. DeMars, Paulius Satkus","doi":"10.1177/00131644231203688","DOIUrl":"https://doi.org/10.1177/00131644231203688","url":null,"abstract":"Marginal maximum likelihood, a common estimation method for item response theory models, is not inherently a Bayesian procedure. However, due to estimation difficulties, Bayesian priors are often applied to the likelihood when estimating 3PL models, especially with small samples. Little focus has been placed on choosing the priors for marginal maximum estimation. In this study, using sample sizes of 1,000 or smaller, not using priors often led to extreme, implausible parameter estimates. Applying prior distributions to the c-parameters alleviated the estimation problems with samples of 500 or more; for the samples of 100, priors on both the a-parameters and c-parameters were needed. Estimates were biased when the mode of the prior did not match the true parameter value, but the degree of the bias did not depend on the strength of the prior unless it was extremely informative. The root mean squared error (RMSE) of the a-parameters and b-parameters did not depend greatly on either the mode or the strength of the prior unless it was extremely informative. The RMSE of the c-parameters, like the bias, depended on the mode of the prior for c.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136067363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear Factor Analytic Thurstonian Forced-Choice Models: Current Status and Issues","authors":"Markus T. Jansen, Ralf Schulze","doi":"10.1177/00131644231205011","DOIUrl":"https://doi.org/10.1177/00131644231205011","url":null,"abstract":"Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent methodological developments, the estimation of normative trait scores has become possible in addition to the computation of only ipsative scores. This opened up the important possibility of comparisons between individuals with forced-choice assessment procedures. With item response theory (IRT) methods, a multidimensional forced-choice (MFC) format has also been proposed to estimate individual scores. Customarily, items to assess different traits are presented in blocks, often triplets, in applications of the MFC, which is an efficient form of item presentation but also a simplification of the original models. The present study provides a comprehensive review of the present status of Thurstonian forced-choice models and their variants. Critical features of the current models, especially the block models, are identified and discussed. It is concluded that MFC modeling with item blocks is highly problematic and yields biased results. In particular, the often-recommended presentation of blocks with items that are keyed in different directions of a trait proves to be counterproductive considering the goal to reduce response tendencies. The consequences and implications of the highlighted issues are further discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136069087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thinking About Sum Scores Yet Again, Maybe the Last Time, We Don’t Know, Oh No . . .: A Comment on","authors":"Keith F. Widaman, William Revelle","doi":"10.1177/00131644231205310","DOIUrl":"https://doi.org/10.1177/00131644231205310","url":null,"abstract":"The relative advantages and disadvantages of sum scores and estimated factor scores are issues of concern for substantive research in psychology. Recently, while championing estimated factor scores over sum scores, McNeish offered a trenchant rejoinder to an article by Widaman and Revelle, which had critiqued an earlier paper by McNeish and Wolf. In the recent contribution, McNeish misrepresented a number of claims by Widaman and Revelle, rendering moot his criticisms of Widaman and Revelle. Notably, McNeish chose to avoid confronting a key strength of sum scores stressed by Widaman and Revelle—the greater comparability of results across studies if sum scores are used. Instead, McNeish pivoted to present a host of simulation studies to identify relative strengths of estimated factor scores. Here, we review our prior claims and, in the process, deflect purported criticisms by McNeish. We discuss briefly issues related to simulated data and empirical data that provide evidence of strengths of each type of score. In doing so, we identified a second strength of sum scores: superior cross-validation of results across independent samples of empirical data, at least for samples of moderate size. We close with consideration of four general issues concerning sum scores and estimated factor scores that highlight the contrasts between positions offered by McNeish and by us, issues of importance when pursuing applied research in our field.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135859120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sanaz Nazari, Walter L Leite, A Corinne Huggins-Manley
{"title":"A Comparison of Person-Fit Indices to Detect Social Desirability Bias.","authors":"Sanaz Nazari, Walter L Leite, A Corinne Huggins-Manley","doi":"10.1177/00131644221129577","DOIUrl":"10.1177/00131644221129577","url":null,"abstract":"<p><p>Social desirability bias (SDB) has been a major concern in educational and psychological assessments when measuring latent variables because it has the potential to introduce measurement error and bias in assessments. Person-fit indices can detect bias in the form of misfitted response vectors. The objective of this study was to compare the performance of 14 person-fit indices to identify SDB in simulated responses. The area under the curve (AUC) of receiver operating characteristic (ROC) curve analysis was computed to evaluate the predictive power of these statistics. The findings showed that the agreement statistic <math><mrow><mo>(</mo><mi>A</mi><mo>)</mo></mrow></math> outperformed all other person-fit indices, while the disagreement statistic <math><mrow><mo>(</mo><mi>D</mi><mo>)</mo></mrow></math>, dependability statistic <math><mrow><mo>(</mo><mi>E</mi><mo>)</mo></mrow></math>, and the number of Guttman errors <math><mrow><mo>(</mo><mi>G</mi><mo>)</mo></mrow></math> also demonstrated high AUCs to detect SDB. Recommendations for practitioners to use these fit indices are provided.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470160/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10208755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Rating Scale Malfunctioning With the Partial Credit Model and Generalized Partial Credit Model.","authors":"Stefanie A Wind","doi":"10.1177/00131644221116292","DOIUrl":"10.1177/00131644221116292","url":null,"abstract":"<p><p>Rating scale analysis techniques provide researchers with practical tools for examining the degree to which ordinal rating scales (e.g., Likert-type scales or performance assessment rating scales) function in psychometrically useful ways. When rating scales function as expected, researchers can interpret ratings in the intended direction (i.e., lower ratings mean \"less\" of a construct than higher ratings), distinguish between categories in the scale (i.e., each category reflects a unique level of the construct), and compare ratings across elements of the measurement instrument, such as individual items. Although researchers have used these techniques in a variety of contexts, studies are limited that systematically explore their sensitivity to problematic rating scale characteristics (i.e., \"rating scale malfunctioning\"). I used a real data analysis and a simulation study to systematically explore the sensitivity of rating scale analysis techniques based on two popular polytomous item response theory (IRT) models: the partial credit model (PCM) and the generalized partial credit model (GPCM). Overall, results indicated that both models provide valuable information about rating scale threshold ordering and precision that can help researchers understand how their rating scales are functioning and identify areas for further investigation or revision. However, there were some differences between models in their sensitivity to rating scale malfunctioning in certain conditions. Implications for research and practice are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10506045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georgios Sideridis, Ioannis Tsaousis, Hanan Ghamdi
{"title":"Equidistant Response Options on Likert-Type Instruments: Testing the Interval Scaling Assumption Using Mplus.","authors":"Georgios Sideridis, Ioannis Tsaousis, Hanan Ghamdi","doi":"10.1177/00131644221130482","DOIUrl":"10.1177/00131644221130482","url":null,"abstract":"<p><p>The purpose of the present study was to provide the means to evaluate the \"interval-scaling\" assumption that governs the use of parametric statistics and continuous data estimators in self-report instruments that utilize Likert-type scaling. Using simulated and real data, the methodology to test for this important assumption is evaluated using the popular software Mplus 8.8. Evidence on meeting the assumption is provided using the Wald test and the equidistant index. It is suggested that routine evaluations of self-report instruments engage the present methodology so that the most appropriate estimator will be implemented when testing the construct validity of self-report instruments.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470166/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10357822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Séverin Lions, Pablo Dartnell, Gabriela Toledo, María Inés Godoy, Nora Córdova, Daniela Jiménez, Julie Lemarié
{"title":"Position of Correct Option and Distractors Impacts Responses to Multiple-Choice Items: Evidence From a National Test.","authors":"Séverin Lions, Pablo Dartnell, Gabriela Toledo, María Inés Godoy, Nora Córdova, Daniela Jiménez, Julie Lemarié","doi":"10.1177/00131644221132335","DOIUrl":"10.1177/00131644221132335","url":null,"abstract":"<p><p>Even though the impact of the position of response options on answers to multiple-choice items has been investigated for decades, it remains debated. Research on this topic is inconclusive, perhaps because too few studies have obtained experimental data from large-sized samples in a real-world context and have manipulated the position of both correct response and distractors. Since multiple-choice tests' outcomes can be strikingly consequential and option position effects constitute a potential source of measurement error, these effects should be clarified. In this study, two experiments in which the position of correct response and distractors was carefully manipulated were performed within a Chilean national high-stakes standardized test, responded by 195,715 examinees. Results show small but clear and systematic effects of options position on examinees' responses in both experiments. They consistently indicate that a five-option item is slightly easier when the correct response is in A rather than E and when the most attractive distractor is after and far away from the correct response. They clarify and extend previous findings, showing that the appeal of all options is influenced by position. The existence and nature of a potential interference phenomenon between the options' processing are discussed, and implications for test development are considered.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470158/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10306861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Impact and Detection of Uniform Differential Item Functioning for Continuous Item Response Models.","authors":"W Holmes Finch","doi":"10.1177/00131644221111993","DOIUrl":"10.1177/00131644221111993","url":null,"abstract":"<p><p>Psychometricians have devoted much research and attention to categorical item responses, leading to the development and widespread use of item response theory for the estimation of model parameters and identification of items that do not perform in the same way for examinees from different population subgroups (e.g., differential item functioning [DIF]). With the increasing use of computer-based measurement, use of items with a continuous response modality is becoming more common. Models for use with these items have been developed and refined in recent years, but less attention has been devoted to investigating DIF for these continuous response models (CRMs). Therefore, the purpose of this simulation study was to compare the performance of three potential methods for assessing DIF for CRMs, including regression, the MIMIC model, and factor invariance testing. Study results revealed that the MIMIC model provided a combination of Type I error control and relatively high power for detecting DIF. Implications of these findings are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10506042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Preknowledge Cheating via Innovative Measures: A Mixture Hierarchical Model for Jointly Modeling Item Responses, Response Times, and Visual Fixation Counts.","authors":"Kaiwen Man, Jeffrey R Harring","doi":"10.1177/00131644221136142","DOIUrl":"10.1177/00131644221136142","url":null,"abstract":"<p><p>Preknowledge cheating jeopardizes the validity of inferences based on test results. Many methods have been developed to detect preknowledge cheating by jointly analyzing item responses and response times. Gaze fixations, an essential eye-tracker measure, can be utilized to help detect aberrant testing behavior with improved accuracy beyond using product and process data types in isolation. As such, this study proposes a mixture hierarchical model that integrates item responses, response times, and visual fixation counts collected from an eye-tracker (a) to detect aberrant test takers who have different levels of preknowledge and (b) to account for nuances in behavioral patterns between normally-behaved and aberrant examinees. A Bayesian approach to estimating model parameters is carried out via an MCMC algorithm. Finally, the proposed model is applied to experimental data to illustrate how the model can be used to identify test takers having preknowledge on the test items.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10525106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}