{"title":"A Note on Evaluation of Polytomous Item Locations With the Rating Scale Model and Testing Its Fit","authors":"Tenko Raykov, Martin Pusic","doi":"10.1177/00131644241259026","DOIUrl":"https://doi.org/10.1177/00131644241259026","url":null,"abstract":"A procedure is outlined for point and interval estimation of location parameters associated with polytomous items, or raters assessing studied subjects or cases, which follow the rating scale model. The method is developed within the framework of latent variable modeling, and is readily applied in empirical research using popular software. The approach permits testing the goodness of fit of this widely used model, which represents a rather parsimonious item response theory model as a means of description and explanation of an analyzed data set. The procedure allows examination of important aspects of the functioning of measuring instruments with polytomous ordinal items, which may also constitute person assessments furnished by teachers, counselors, judges, raters, or clinicians. The described method is illustrated using an empirical example.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. J. Ferrando, D. Navarro-González, F. Morales-Vives
{"title":"Linear and Nonlinear Indices of Score Accuracy and Item Effectiveness for Measures That Contain Locally Dependent Items","authors":"P. J. Ferrando, D. Navarro-González, F. Morales-Vives","doi":"10.1177/00131644241257602","DOIUrl":"https://doi.org/10.1177/00131644241257602","url":null,"abstract":"The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the scores based on these extended solutions have received little attention so far. Here, we propose an approach to simple sum scores, designed to assess the impact of LIDs on the accuracy and effectiveness of the scores derived from extended FA solutions with correlated residuals. The proposal is structured at three levels—(a) total score, (b) bivariate-doublet, and (c) item-by-item deletion—and considers two types of FA models: the standard linear model and the nonlinear model for ordered-categorical item responses. The current proposal is implemented in SINRELEF.LD, an R package available through CRAN. The usefulness of the proposal for item analysis is illustrated with the data of 928 participants who completed the Family Involvement Questionnaire-High School Version (FIQ-HS). The results show not only the distortion that the doublets cause in the omega reliability estimate when local independency is assumed but also the loss of information/efficiency due to the local dependencies.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141348988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why Forced-Choice and Likert Items Provide the Same Information on Personality, Including Social Desirability.","authors":"Martin Bäckström, Fredrik Björklund","doi":"10.1177/00131644231178721","DOIUrl":"10.1177/00131644231178721","url":null,"abstract":"<p><p>The forced-choice response format is often considered superior to the standard Likert-type format for controlling social desirability in personality inventories. We performed simulations and found that the trait information based on the two formats converges when the number of items is high and forced-choice items are mixed with regard to positively and negatively keyed items. Given that forced-choice items extract the same personality information as Likert-type items do, including socially desirable responding, other means are needed to counteract social desirability. We propose using evaluatively neutralized items in personality measurement, as they can counteract social desirability regardless of response format.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095325/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44637778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Multiple Imputation to Account for the Uncertainty Due to Missing Data in the Context of Factor Retention.","authors":"Yan Xia, Selim Havan","doi":"10.1177/00131644231178800","DOIUrl":"10.1177/00131644231178800","url":null,"abstract":"<p><p>Although parallel analysis has been found to be an accurate method for determining the number of factors in many conditions with complete data, its application under missing data is limited. The existing literature recommends that, after using an appropriate multiple imputation method, researchers either apply parallel analysis to every imputed data set and use the number of factors suggested by most of the data copies or average the correlation matrices across all data copies, followed by applying the parallel analysis to the average correlation matrix. Both approaches for pooling the results provide a single suggested number without reflecting the uncertainty introduced by missing values. The present study proposes the use of an alternative approach, which calculates the proportion of imputed data sets that result in <i>k</i> (<i>k</i> = 1, 2, 3 . . .) factors. This approach will inform applied researchers of the degree of uncertainty due to the missingness. Results from a simulation experiment show that the proposed method can more likely suggest the correct number of factors when missingness contributes to a large amount of uncertainty.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46745523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Equating Methods for Varying Levels of Form Difference.","authors":"Ting Sun, Stella Yun Kim","doi":"10.1177/00131644231176989","DOIUrl":"10.1177/00131644231176989","url":null,"abstract":"<p><p>Equating is a statistical procedure used to adjust for the difference in form difficulty such that scores on those forms can be used and interpreted comparably. In practice, however, equating methods are often implemented without considering the extent to which two forms differ in difficulty. The study aims to examine the effect of the magnitude of a form difficulty difference on equating results under random group (RG) and common-item nonequivalent group (CINEG) designs. Specifically, this study evaluates the performance of six equating methods under a set of simulation conditions including varying levels of form difference. Results revealed that, under the RG design, mean equating was proven to be the most accurate method when there is no or small form difference, whereas equipercentile is the most accurate method when the difficulty difference is medium or large. Under the CINEG design, Tucker Linear was found to be the most accurate method when the difficulty difference is medium or small, and either chained equipercentile or frequency estimation is preferred with a large difficulty level. This study would provide practitioners with research evidence-based guidance in the choice of equating methods with varying levels of form difference. As the condition of no form difficulty difference is also included, this study would inform testing companies of appropriate equating methods when two forms are similar in difficulty level.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095324/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46627790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Can People With Higher Versus Lower Scores on Impression Management or Self-Monitoring Be Identified Through Different Traces Under Faking?","authors":"Jessica Röhner, Philipp Thoss, Liad Uziel","doi":"10.1177/00131644231182598","DOIUrl":"10.1177/00131644231182598","url":null,"abstract":"<p><p>According to faking models, personality variables and faking are related. Most prominently, people's tendency to try to make an appropriate impression (impression management; IM) and their tendency to adjust the impression they make (self-monitoring; SM) have been suggested to be associated with faking. Nevertheless, empirical findings connecting these personality variables to faking have been contradictory, partly because different studies have given individuals different tests to fake and different faking directions (to fake low vs. high scores). Importantly, whereas past research has focused on faking by examining test scores, recent advances have suggested that the faking process could be better understood by analyzing individuals' responses at the item level (response pattern). Using machine learning (elastic net and random forest regression), we reanalyzed a data set (<i>N</i> = 260) to investigate whether individuals' faked response patterns on extraversion (features; i.e., input variables) could reveal their IM and SM scores. We found that individuals had similar response patterns when they faked, irrespective of their IM scores (excluding the faking of high scores when random forest regression was used). Elastic net and random forest regression converged in revealing that individuals higher on SM differed from individuals lower on SM in how they faked. Thus, response patterns were able to reveal individuals' SM, but not IM. Feature importance analyses showed that whereas some items were faked differently by individuals with higher versus lower SM scores, others were faked similarly. Our results imply that analyses of response patterns offer valuable new insights into the faking process.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47440034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Item Response Theory Model for Incorporating Response Times in Forced-Choice Measures.","authors":"Zhichen Guo, Daxun Wang, Yan Cai, Dongbo Tu","doi":"10.1177/00131644231171193","DOIUrl":"10.1177/00131644231171193","url":null,"abstract":"<p><p>Forced-choice (FC) measures have been widely used in many personality or attitude tests as an alternative to rating scales, which employ comparative rather than absolute judgments. Several response biases, such as social desirability, response styles, and acquiescence bias, can be reduced effectively. Another type of data linked with comparative judgments is response time (RT), which contains potential information concerning respondents' decision-making process. It would be challenging but exciting to combine RT into FC measures better to reveal respondents' behaviors or preferences in personality measurement. Given this situation, this study aims to propose a new item response theory (IRT) model that incorporates RT into FC measures to improve personality assessment. Simulation studies show that the proposed model can effectively improve the estimation accuracy of personality traits with the ancillary information contained in RT. Also, an application on a real data set reveals that the proposed model estimates similar but different parameter values compared with the conventional Thurstonian IRT model. The RT information can explain these differences.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43885429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pere J Ferrando, Fabia Morales-Vives, Ana Hernández-Dorado
{"title":"Measuring Unipolar Traits With Continuous Response Items: Some Methodological and Substantive Developments.","authors":"Pere J Ferrando, Fabia Morales-Vives, Ana Hernández-Dorado","doi":"10.1177/00131644231181889","DOIUrl":"10.1177/00131644231181889","url":null,"abstract":"<p><p>In recent years, some models for binary and graded format responses have been proposed to assess unipolar variables or \"quasi-traits.\" These studies have mainly focused on clinical variables that have traditionally been treated as bipolar traits. In the present study, we have made a proposal for unipolar traits measured with continuous response items. The proposed log-logistic continuous unipolar model (LL-C) is remarkably simple and is more similar to the original binary formulation than the graded extensions, which is an advantage. Furthermore, considering that irrational, extreme, or polarizing beliefs could be another domain of unipolar variables, we have applied this proposal to an empirical example of superstitious beliefs. The results suggest that, in certain cases, the standard linear model can be a good approximation to the LL-C model in terms of parameter estimation and goodness of fit, but not trait estimates and their accuracy. The results also show the importance of considering the unipolar nature of this kind of trait when predicting criterion variables, since the validity results were clearly different.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095320/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42691490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Wald χ<sup>2</sup> Test for Differential Item Functioning Detection with Polytomous Items in Multilevel Data.","authors":"Sijia Huang, Dubravka Svetina Valdivia","doi":"10.1177/00131644231181688","DOIUrl":"10.1177/00131644231181688","url":null,"abstract":"<p><p>Identifying items with differential item functioning (DIF) in an assessment is a crucial step for achieving equitable measurement. One critical issue that has not been fully addressed with existing studies is how DIF items can be detected when data are multilevel. In the present study, we introduced a Lord's Wald <math><mrow><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></math> test-based procedure for detecting both uniform and non-uniform DIF with polytomous items in the presence of the ubiquitous multilevel data structure. The proposed approach is a multilevel extension of a two-stage procedure, which identifies anchor items in its first stage and formally evaluates candidate items in the second stage. We applied the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm to estimate multilevel polytomous item response theory (IRT) models and to obtain accurate covariance matrices. To evaluate the performance of the proposed approach, we conducted a preliminary simulation study that considered various conditions to mimic real-world scenarios. The simulation results indicated that the proposed approach has great power for identifying DIF items and well controls the Type I error rate. Limitations and future research directions were also discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095326/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42032084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Evaluation of Fit Indices Used in Model Selection of Dichotomous Mixture IRT Models.","authors":"Sedat Sen, Allan S Cohen","doi":"10.1177/00131644231180529","DOIUrl":"10.1177/00131644231180529","url":null,"abstract":"<p><p>A Monte Carlo simulation study was conducted to compare fit indices used for detecting the correct latent class in three dichotomous mixture item response theory (IRT) models. Ten indices were considered: Akaike's information criterion (AIC), the corrected AIC (AICc), Bayesian information criterion (BIC), consistent AIC (CAIC), Draper's information criterion (DIC), sample size adjusted BIC (SABIC), relative entropy, the integrated classification likelihood criterion (ICL-BIC), the adjusted Lo-Mendell-Rubin (LMR), and Vuong-Lo-Mendell-Rubin (VLMR). The accuracy of the fit indices was assessed for correct detection of the number of latent classes for different simulation conditions including sample size (2,500 and 5,000), test length (15, 30, and 45), mixture proportions (equal and unequal), number of latent classes (2, 3, and 4), and latent class separation (no-separation and small separation). Simulation study results indicated that as the number of examinees or number of items increased, correct identification rates also increased for most of the indices. Correct identification rates by the different fit indices, however, decreased as the number of estimated latent classes or parameters (i.e., model complexity) increased. Results were good for BIC, CAIC, DIC, SABIC, ICL-BIC, LMR, and VLMR, and the relative entropy index tended to select correct models most of the time. Consistent with previous studies, AIC and AICc showed poor performance. Most of these indices had limited utility for three-class and four-class mixture 3PL model conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11095322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46075824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}