{"title":"Summary Intervals for Model-Based Classification Accuracy and Consistency Indices.","authors":"Oscar Gonzalez","doi":"10.1177/00131644221092347","DOIUrl":"https://doi.org/10.1177/00131644221092347","url":null,"abstract":"<p><p>When scores are used to make decisions about respondents, it is of interest to estimate classification accuracy (CA), the probability of making a correct decision, and classification consistency (CC), the probability of making the same decision across two parallel administrations of the measure. Model-based estimates of CA and CC computed from the linear factor model have been recently proposed, but parameter uncertainty of the CA and CC indices has not been investigated. This article demonstrates how to estimate percentile bootstrap confidence intervals and Bayesian credible intervals for CA and CC indices, which have the added benefit of incorporating the sampling variability of the parameters of the linear factor model to summary intervals. Results from a small simulation study suggest that percentile bootstrap confidence intervals have appropriate confidence interval coverage, although displaying a small negative bias. However, Bayesian credible intervals with diffused priors have poor interval coverage, but their coverage improves once empirical, weakly informative priors are used. The procedures are illustrated by estimating CA and CC indices from a measure used to identify individuals low on mindfulness for a hypothetical intervention, and R code is provided to facilitate the implementation of the procedures.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"240-261"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning.","authors":"Mirka Henninger, Rudolf Debelak, Carolin Strobl","doi":"10.1177/00131644221077135","DOIUrl":"10.1177/00131644221077135","url":null,"abstract":"<p><p>To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"181-212"},"PeriodicalIF":2.1,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoling Liu, Pei Cao, Xinzhen Lai, Jianbing Wen, Yanyun Yang
{"title":"Assessing Essential Unidimensionality of Scales and Structural Coefficient Bias.","authors":"Xiaoling Liu, Pei Cao, Xinzhen Lai, Jianbing Wen, Yanyun Yang","doi":"10.1177/00131644221075580","DOIUrl":"10.1177/00131644221075580","url":null,"abstract":"<p><p>Percentage of uncontaminated correlations (PUC), explained common variance (ECV), and omega hierarchical (ω<sub>H</sub>) have been used to assess the degree to which a scale is essentially unidimensional and to predict structural coefficient bias when a unidimensional measurement model is fit to multidimensional data. The usefulness of these indices has been investigated in the context of bifactor models with balanced structures. This study extends the examination by focusing on bifactor models with unbalanced structures. The maximum and minimum PUC values given the total number of items and factors were derived. The usefulness of PUC, ECV, and ω<sub>H</sub> in predicting structural coefficient bias was examined under a variety of structural regression models with bifactor measurement components. Results indicated that the performance of these indices in predicting structural coefficient bias depended on whether the bifactor measurement model had a balanced or unbalanced structure. PUC failed to predict structural coefficient bias when the bifactor model had an unbalanced structure. ECV performed reasonably well, but worse than ω<sub>H</sub>.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"28-47"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806515/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnostic Classification Model for Forced-Choice Items and Noncognitive Tests.","authors":"Hung-Yu Huang","doi":"10.1177/00131644211069906","DOIUrl":"https://doi.org/10.1177/00131644211069906","url":null,"abstract":"<p><p>The forced-choice (FC) item formats used for noncognitive tests typically develop a set of response options that measure different traits and instruct respondents to make judgments among these options in terms of their preference to control the response biases that are commonly observed in normative tests. Diagnostic classification models (DCMs) can provide information regarding the mastery status of test takers on latent discrete variables and are more commonly used for cognitive tests employed in educational settings than for noncognitive tests. The purpose of this study is to develop a new class of DCM for FC items under the higher-order DCM framework to meet the practical demands of simultaneously controlling for response biases and providing diagnostic classification information. By conducting a series of simulations and calibrating the model parameters with a Bayesian estimation, the study shows that, in general, the model parameters can be recovered satisfactorily with the use of long tests and large samples. More attributes improve the precision of the second-order latent trait estimation in a long test, but decrease the classification accuracy and the estimation quality of the structural parameters. When statements are allowed to load on two distinct attributes in paired comparison items, the specific-attribute condition produces better a parameter estimation than the overlap-attribute condition. Finally, an empirical analysis related to work-motivation measures is presented to demonstrate the applications and implications of the new model.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"146-180"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/5c/8c/10.1177_00131644211069906.PMC9806518.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charles L Fisk, Jeffrey R Harring, Zuchao Shen, Walter Leite, King Yiu Suen, Katerina M Marcoulides
{"title":"Using Simulated Annealing to Investigate Sensitivity of SEM to External Model Misspecification.","authors":"Charles L Fisk, Jeffrey R Harring, Zuchao Shen, Walter Leite, King Yiu Suen, Katerina M Marcoulides","doi":"10.1177/00131644211073121","DOIUrl":"10.1177/00131644211073121","url":null,"abstract":"<p><p>Sensitivity analyses encompass a broad set of post-analytic techniques that are characterized as measuring the potential impact of any factor that has an effect on some output variables of a model. This research focuses on the utility of the simulated annealing algorithm to automatically identify path configurations and parameter values of omitted confounders in structural equation modeling (SEM). An empirical example based on a past published study is used to illustrate how strongly related an omitted variable must be to model variables for the conclusions of an analysis to change. The algorithm is outlined in detail and the results stemming from the sensitivity analysis are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"73-92"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10494315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Croon's Bias-Corrected Estimation for Multilevel Structural Equation Models with Non-Normal Indicators and Model Misspecifications.","authors":"Kyle Cox, Benjamin Kelcey","doi":"10.1177/00131644221080451","DOIUrl":"10.1177/00131644221080451","url":null,"abstract":"<p><p>Multilevel structural equation models (MSEMs) are well suited for educational research because they accommodate complex systems involving latent variables in multilevel settings. Estimation using Croon's bias-corrected factor score (BCFS) path estimation has recently been extended to MSEMs and demonstrated promise with limited sample sizes. This makes it well suited for planned educational research which often involves sample sizes constrained by logistical and financial factors. However, the performance of BCFS estimation with MSEMs has yet to be thoroughly explored under common but difficult conditions including in the presence of non-normal indicators and model misspecifications. We conducted two simulation studies to evaluate the accuracy and efficiency of the estimator under these conditions. Results suggest that BCFS estimation of MSEMs is often more dependable, more efficient, and less biased than other estimation approaches when sample sizes are limited or model misspecifications are present but is more susceptible to indicator non-normality. These results support, supplement, and elucidate previous literature describing the effective performance of BCFS estimation encouraging its utilization as an alternative or supplemental estimator for MSEMs.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"48-72"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806522/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hope O Akaeze, Frank R Lawrence, Jamie Heng-Chieh Wu
{"title":"Resolving Dimensionality in a Child Assessment Tool: An Application of the Multilevel Bifactor Model.","authors":"Hope O Akaeze, Frank R Lawrence, Jamie Heng-Chieh Wu","doi":"10.1177/00131644221082688","DOIUrl":"10.1177/00131644221082688","url":null,"abstract":"<p><p>Multidimensionality and hierarchical data structure are common in assessment data. These design features, if not accounted for, can threaten the validity of the results and inferences generated from factor analysis, a method frequently employed to assess test dimensionality. In this article, we describe and demonstrate the application of the multilevel bifactor model to address these features in examining test dimensionality. The tool for this exposition is the Child Observation Record Advantage 1.5 (COR-Adv1.5), a child assessment instrument widely used in Head Start programs. Previous studies on this assessment tool reported highly correlated factors and did not account for the nesting of children in classrooms. Results from this study show how the flexibility of the multilevel bifactor model, together with useful model-based statistics, can be harnessed to judge the dimensionality of a test instrument and inform the interpretability of the associated factor scores.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"93-115"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806520/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10494318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power Analysis for Moderator Effects in Longitudinal Cluster Randomized Designs.","authors":"Wei Li, Spyros Konstantopoulos","doi":"10.1177/00131644221077359","DOIUrl":"10.1177/00131644221077359","url":null,"abstract":"<p><p>Cluster randomized control trials often incorporate a longitudinal component where, for example, students are followed over time and student outcomes are measured repeatedly. Besides examining how intervention effects induce changes in outcomes, researchers are sometimes also interested in exploring whether intervention effects on outcomes are modified by moderator variables at the individual (e.g., gender, race/ethnicity) and/or the cluster level (e.g., school urbanicity) over time. This study provides methods for statistical power analysis of moderator effects in two- and three-level longitudinal cluster randomized designs. Power computations take into account clustering effects, the number of measurement occasions, the impact of sample sizes at different levels, covariates effects, and the variance of the moderator variable. Illustrative examples are offered to demonstrate the applicability of the methods.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"116-145"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806516/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance of Coefficient Alpha and Its Alternatives: Effects of Different Types of Non-Normality.","authors":"Leifeng Xiao, Kit-Tai Hau","doi":"10.1177/00131644221088240","DOIUrl":"10.1177/00131644221088240","url":null,"abstract":"<p><p>We examined the performance of coefficient alpha and its potential competitors (ordinal alpha, omega total, Revelle's omega total [omega RT], omega hierarchical [omega h], greatest lower bound [GLB], and coefficient <i>H</i>) with continuous and discrete data having different types of non-normality. Results showed the estimation bias was acceptable for continuous data with varying degrees of non-normality when the scales were strong (high loadings). This bias, however, became quite large with moderate strength scales and increased with increasing non-normality. For Likert-type scales, other than omega h, most indices were acceptable with non-normal data having at least four points, and more points were better. For different exponential distributed data, omega RT and GLB were robust, whereas the bias of other indices for binomial-beta distribution was generally large. An examination of an authentic large-scale international survey suggested that its items were at worst moderately non-normal; hence, non-normality was not a big concern. We recommend (a) the demand for continuous and normally distributed data for alpha may not be necessary for less severely non-normal data; (b) for severely non-normal data, we should have at least four scale points, and more points are better; and (c) there is no single golden standard for all data types, other issues such as scale loading, model structure, or scale length are also important.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"5-27"},"PeriodicalIF":2.7,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tenko Raykov, Christine DiStefano, Lisa Calvocoressi, Martin Volker
{"title":"On Effect Size Measures for Nested Measurement Models.","authors":"Tenko Raykov, Christine DiStefano, Lisa Calvocoressi, Martin Volker","doi":"10.1177/00131644211066845","DOIUrl":"10.1177/00131644211066845","url":null,"abstract":"<p><p>A class of effect size indices are discussed that evaluate the degree to which two nested confirmatory factor analysis models differ from each other in terms of fit to a set of observed variables. These descriptive effect measures can be used to quantify the impact of parameter restrictions imposed in an initially considered model and are free from an explicit relationship to sample size. The described indices represent the extent to which respective linear combinations of the proportions of explained variance in the manifest variables are changed as a result of introducing the constraints. The indices reflect corresponding aspects of the impact of the restrictions and are independent of their statistical significance or lack thereof. The discussed effect size measures are readily point and interval estimated, using popular software, and their application is illustrated with numerical examples.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"82 6","pages":"1225-1246"},"PeriodicalIF":2.1,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9619317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10840615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}