Educational and Psychological Measurement最新文献_第10页

Evaluation of Polytomous Item Locations in Multicomponent Measuring Instruments: A Note on a Latent Variable Modeling Procedure. 评估多成分测量工具中的多项式项目位置：关于潜在变量建模程序的说明。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-06-01 Epub Date: 2022-03-02 DOI: 10.1177/00131644211072829

Tenko Raykov, Martin Pusic

引用次数: 0

On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments. 关于多维强制选择适应性评估中的组库和组块选择。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-04-28 DOI: 10.1177/00131644221087986

Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad

{"title":"On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments.","authors":"Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad","doi":"10.1177/00131644221087986","DOIUrl":"10.1177/00131644221087986","url":null,"abstract":"Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and (b) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"294-321"},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972126/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating Confidence Intervals of Item Parameters When Some Item Parameters Take Priors in the 2PL and 3PL Models. 当某些项目参数在 2PL 和 3PL 模型中具有优先权时，调查项目参数的置信区间。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-05-16 DOI: 10.1177/00131644221096431

Insu Paek, Zhongtian Lin, Robert Philip Chalmers

{"title":"Investigating Confidence Intervals of Item Parameters When Some Item Parameters Take Priors in the 2PL and 3PL Models.","authors":"Insu Paek, Zhongtian Lin, Robert Philip Chalmers","doi":"10.1177/00131644221096431","DOIUrl":"10.1177/00131644221096431","url":null,"abstract":"To reduce the chance of Heywood cases or nonconvergence in estimating the 2PL or the 3PL model in the marginal maximum likelihood with the expectation-maximization (MML-EM) estimation method, priors for the item slope parameter in the 2PL model or for the pseudo-guessing parameter in the 3PL model can be used and the marginal maximum a posteriori (MMAP) and posterior standard error (PSE) are estimated. Confidence intervals (CIs) for these parameters and other parameters which did not take any priors were investigated with popular prior distributions, different error covariance estimation methods, test lengths, and sample sizes. A seemingly paradoxical result was that, when priors were taken, the conditions of the error covariance estimation methods known to be better in the literature (Louis or Oakes method in this study) did not yield the best results for the CI performance, while the conditions of the cross-product method for the error covariance estimation which has the tendency of upward bias in estimating the standard errors exhibited better CI performance. Other important findings for the CI performance are also discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"375-400"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972130/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multidimensional Forced-Choice CAT With Dominance Items: An Empirical Comparison With Optimal Static Testing Under Different Desirability Matching. 带有优势项目的多维强制选择 CAT：在不同可取性匹配条件下与最佳静态测试的实证比较。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-03-07 DOI: 10.1177/00131644221077637

Yin Lin, Anna Brown, Paul Williams

{"title":"Multidimensional Forced-Choice CAT With Dominance Items: An Empirical Comparison With Optimal Static Testing Under Different Desirability Matching.","authors":"Yin Lin, Anna Brown, Paul Williams","doi":"10.1177/00131644221077637","DOIUrl":"10.1177/00131644221077637","url":null,"abstract":"Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"322-350"},"PeriodicalIF":2.1,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Range Restriction Affects Factor Analysis: Normality, Estimation, Fit, Loadings, and Reliability. 范围限制对因子分析的影响：正态性、估计、拟合、载荷和可靠性。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-03-10 DOI: 10.1177/00131644221081867

Alicia Franco-Martínez, Jesús M Alvarado, Miguel A Sorrel

{"title":"Range Restriction Affects Factor Analysis: Normality, Estimation, Fit, Loadings, and Reliability.","authors":"Alicia Franco-Martínez, Jesús M Alvarado, Miguel A Sorrel","doi":"10.1177/00131644221081867","DOIUrl":"10.1177/00131644221081867","url":null,"abstract":"A sample suffers range restriction (RR) when its variance is reduced comparing with its population variance and, in turn, it fails representing such population. If the RR occurs over the latent factor, not directly over the observed variable, the researcher deals with an indirect RR, common when using convenience samples. This work explores how this problem affects different outputs of the factor analysis: multivariate normality (MVN), estimation process, goodness-of-fit, recovery of factor loadings, and reliability. In doing so, a Monte Carlo study was conducted. Data were generated following the linear selective sampling model, simulating tests varying their sample size ( <math><mrow><mi>N</mi></mrow> </math> = 200 and 500 cases), test size ( <math><mrow><mi>J</mi></mrow> </math> = 6, 12, 18, and 24 items), loading size ( <math><mrow><mi>L</mi></mrow> </math> = .50, .70, and .90), and restriction size (from <math><mrow><mi>R</mi></mrow> </math> = 1, .90, .80, and so on till .10 selection ratio). Our results systematically suggest that an interaction between decreasing the loading size and increasing the restriction size affects the MVN assessment, obstructs the estimation process, and leads to an underestimation of the factor loadings and reliability. However, most of the MVN tests and most of the fit indices employed were nonsensitive to the RR problem. We provide some recommendations to applied researchers.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"262-293"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972127/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating the Quality of Classification in Mixture Model Simulations. 评估混合模型模拟中的分类质量。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-04-29 DOI: 10.1177/00131644221093619

Yoona Jang, Sehee Hong

引用次数: 0

Supervised Classes, Unsupervised Mixing Proportions: Detection of Bots in a Likert-Type Questionnaire. 监督类，非监督混合比例：检测李克特类型问卷中的机器人。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-07-30 DOI: 10.1177/00131644221104220

Michael John Ilagan, Carl F Falk

{"title":"Supervised Classes, Unsupervised Mixing Proportions: Detection of Bots in a Likert-Type Questionnaire.","authors":"Michael John Ilagan, Carl F Falk","doi":"10.1177/00131644221104220","DOIUrl":"10.1177/00131644221104220","url":null,"abstract":"Administering Likert-type questionnaires to online samples risks contamination of the data by malicious computer-generated random responses, also known as bots. Although nonresponsivity indices (NRIs) such as person-total correlations or Mahalanobis distance have shown great promise to detect bots, universal cutoff values are elusive. An initial calibration sample constructed via stratified sampling of bots and humans-real or simulated under a measurement model-has been used to empirically choose cutoffs with a high nominal specificity. However, a high-specificity cutoff is less accurate when the target sample has a high contamination rate. In the present article, we propose the supervised classes, unsupervised mixing proportions (SCUMP) algorithm that chooses a cutoff to maximize accuracy. SCUMP uses a Gaussian mixture model to estimate, unsupervised, the contamination rate in the sample of interest. A simulation study found that, in the absence of model misspecification on the bots, our cutoffs maintained accuracy across varying contamination rates.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"217-239"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implementing a Standardized Effect Size in the POLYSIBTEST Procedure. 在 POLYSIBTEST 程序中实施标准化效应大小。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-02-28 DOI: 10.1177/00131644221081011

James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford

{"title":"Implementing a Standardized Effect Size in the POLYSIBTEST Procedure.","authors":"James D Weese, Ronna C Turner, Xinya Liang, Allison Ames, Brandon Crawford","doi":"10.1177/00131644221081011","DOIUrl":"10.1177/00131644221081011","url":null,"abstract":"A study was conducted to implement the use of a standardized effect size and corresponding classification guidelines for polytomous data with the POLYSIBTEST procedure and compare those guidelines with prior recommendations. Two simulation studies were included. The first identifies new unstandardized test heuristics for classifying moderate and large differential item functioning (DIF) for polytomous response data with three to seven response options. These are provided for researchers studying polytomous data using POLYSIBTEST software that has been published previously. The second simulation study provides one pair of standardized effect size heuristics that can be employed with items having any number of response options and compares true-positive and false-positive rates for the standardized effect size proposed by Weese with one proposed by Zwick et al. and two unstandardized classification procedures (Gierl; Golia). All four procedures retained false-positive rates generally below the level of significance at both moderate and large DIF levels. However, Weese's standardized effect size was not affected by sample size and provided slightly higher true-positive rates than the Zwick et al. and Golia's recommendations, while flagging substantially fewer items that might be characterized as having negligible DIF when compared with Gierl's suggested criterion. The proposed effect size allows for easier use and interpretation by practitioners as it can be applied to items with any number of response options and is interpreted as a difference in standard deviation units.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"401-427"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972129/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Summary Intervals for Model-Based Classification Accuracy and Consistency Indices. 基于模型的分类精度和一致性指标的汇总区间。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-04-01 DOI: 10.1177/00131644221092347

Oscar Gonzalez

{"title":"Summary Intervals for Model-Based Classification Accuracy and Consistency Indices.","authors":"Oscar Gonzalez","doi":"10.1177/00131644221092347","DOIUrl":"https://doi.org/10.1177/00131644221092347","url":null,"abstract":"When scores are used to make decisions about respondents, it is of interest to estimate classification accuracy (CA), the probability of making a correct decision, and classification consistency (CC), the probability of making the same decision across two parallel administrations of the measure. Model-based estimates of CA and CC computed from the linear factor model have been recently proposed, but parameter uncertainty of the CA and CC indices has not been investigated. This article demonstrates how to estimate percentile bootstrap confidence intervals and Bayesian credible intervals for CA and CC indices, which have the added benefit of incorporating the sampling variability of the parameters of the linear factor model to summary intervals. Results from a small simulation study suggest that percentile bootstrap confidence intervals have appropriate confidence interval coverage, although displaying a small negative bias. However, Bayesian credible intervals with diffused priors have poor interval coverage, but their coverage improves once empirical, weakly informative priors are used. The procedures are illustrated by estimating CA and CC indices from a measure used to identify individuals low on mindfulness for a hypothetical intervention, and R code is provided to facilitate the implementation of the procedures.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"240-261"},"PeriodicalIF":2.7,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10823910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning. 基于 Mantel-Haenszel 差异项目功能效应大小测量的 Rasch 树新停止标准。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-02-01 Epub Date: 2022-02-28 DOI: 10.1177/00131644221077135

Mirka Henninger, Rudolf Debelak, Carolin Strobl

{"title":"A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning.","authors":"Mirka Henninger, Rudolf Debelak, Carolin Strobl","doi":"10.1177/00131644221077135","DOIUrl":"10.1177/00131644221077135","url":null,"abstract":"To detect differential item functioning (DIF), Rasch trees search for optimal splitpoints in covariates and identify subgroups of respondents in a data-driven way. To determine whether and in which covariate a split should be performed, Rasch trees use statistical significance tests. Consequently, Rasch trees are more likely to label small DIF effects as significant in larger samples. This leads to larger trees, which split the sample into more subgroups. What would be more desirable is an approach that is driven more by effect size rather than sample size. In order to achieve this, we suggest to implement an additional stopping criterion: the popular Educational Testing Service (ETS) classification scheme based on the Mantel-Haenszel odds ratio. This criterion helps us to evaluate whether a split in a Rasch tree is based on a substantial or an ignorable difference in item parameters, and it allows the Rasch tree to stop growing when DIF between the identified subgroups is small. Furthermore, it supports identifying DIF items and quantifying DIF effect sizes in each split. Based on simulation results, we conclude that the Mantel-Haenszel effect size further reduces unnecessary splits in Rasch trees under the null hypothesis, or when the sample size is large but DIF effects are negligible. To make the stopping criterion easy-to-use for applied researchers, we have implemented the procedure in the statistical software R. Finally, we discuss how DIF effects between different nodes in a Rasch tree can be interpreted and emphasize the importance of purification strategies for the Mantel-Haenszel procedure on tree stopping and DIF item classification.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 1","pages":"181-212"},"PeriodicalIF":2.1,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9806517/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10489716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0