Carolina Fellinghauer, Rudolf Debelak, Carolin Strobl
{"title":"What Affects the Quality of Score Transformations? Potential Issues in True-Score Equating Using the Partial Credit Model.","authors":"Carolina Fellinghauer, Rudolf Debelak, Carolin Strobl","doi":"10.1177/00131644221143051","DOIUrl":"10.1177/00131644221143051","url":null,"abstract":"<p><p>This simulation study investigated to what extent departures from construct similarity as well as differences in the difficulty and targeting of scales impact the score transformation when scales are equated by means of concurrent calibration using the partial credit model with a common person design. Practical implications of the simulation results are discussed with a focus on scale equating in health-related research settings. The study simulated data for two scales, varying the number of items and the sample sizes. The factor correlation between scales was used to operationalize construct similarity. Targeting of the scales was operationalized through increasing departure from equal difficulty and by varying the dispersion of the item and person parameters in each scale. The results show that low similarity between scales goes along with lower transformation precision. In cases with equal levels of similarity, precision improves in settings where the range of the item parameters is encompassing the person parameters range. With decreasing similarity, score transformation precision benefits more from good targeting. Difficulty shifts up to two logits somewhat increased the estimation bias but without affecting the transformation precision. The observed robustness against difficulty shifts supports the advantage of applying a true-score equating methods over identity equating, which was used as a naive baseline method for comparison. Finally, larger sample size did not improve the transformation precision in this study, longer scales improved only marginally the quality of the equating. The insights from the simulation study are used in a real-data example.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43041969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Procedures for Analyzing Multidimensional Mixture Data.","authors":"Hsu-Lin Su, Po-Hsi Chen","doi":"10.1177/00131644231151470","DOIUrl":"10.1177/00131644231151470","url":null,"abstract":"<p><p>The multidimensional mixture data structure exists in many test (or inventory) conditions. Heterogeneity also relatively exists in populations. Still, some researchers are interested in deciding to which subpopulation a participant belongs according to the participant's factor pattern. Thus, in this study, we proposed three analysis procedures based on the factor mixture model to analyze data in the multidimensional mixture context. Simulations were manipulated with different levels of factor numbers, factor correlations, numbers of latent classes, and class separation. Issues with regard to model selection were discussed at first. The results showed that in the two-class situations the procedures of \"factor structure first then class number\" (Procedure 1) and \"factor structure and class number considered simultaneously\" (Procedure 3) performed better than the \"class number first then factor structure\" (Procedure 2) and yielded precise parameter estimation and classification accuracy. It would be appropriate to choose Procedures 1 and 3 when strong measurement invariance is assumed while using an information criterion, but Procedure 1 saved more time than Procedure 3. In the three-class situations, the performance of all three procedures was limited. Implementations and suggestions have been addressed in this research.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638979/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48059643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Explanatory Multidimensional Random Item Effects Rating Scale Model.","authors":"Sijia Huang, Jinwen Jevan Luo, Li Cai","doi":"10.1177/00131644221140906","DOIUrl":"10.1177/00131644221140906","url":null,"abstract":"<p><p>Random item effects item response theory (IRT) models, which treat both person and item effects as random, have received much attention for more than a decade. The random item effects approach has several advantages in many practical settings. The present study introduced an explanatory multidimensional random item effects rating scale model. The proposed model was formulated under a novel parameterization of the nominal response model (NRM), and allows for flexible inclusion of person-related and item-related covariates (e.g., person characteristics and item features) to study their impacts on the person and item latent variables. A new variant of the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm designed for latent variable models with crossed random effects was applied to obtain parameter estimates for the proposed model. A preliminary simulation study was conducted to evaluate the performance of the MH-RM algorithm for estimating the proposed model. Results indicated that the model parameters were well recovered. An empirical data set was analyzed to further illustrate the usage of the proposed model.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41340323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional Approaches for Modeling Unfolding Data.","authors":"George Engelhard","doi":"10.1177/00131644221143474","DOIUrl":"10.1177/00131644221143474","url":null,"abstract":"<p><p>The purpose of this study is to introduce a functional approach for modeling unfolding response data. Functional data analysis (FDA) has been used for examining cumulative item response data, but a functional approach has not been systematically used with unfolding response processes. A brief overview of FDA is presented and illustrated within the context of unfolding data. Seven decision parameters are described that can provide a guide to conducting FDA in this context. These decision parameters are illustrated with real data using two scales that are designed to measure attitude toward capital punishment and attitude toward censorship. The analyses suggest that FDA offers a useful set of tools for examining unfolding response processes.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638986/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42770061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Effects of Missing Data Handling Methods on Scale Linking Accuracy.","authors":"Tong Wu, Stella Y Kim, Carl Westine","doi":"10.1177/00131644221140941","DOIUrl":"10.1177/00131644221140941","url":null,"abstract":"<p><p>For large-scale assessments, data are often collected with missing responses. Despite the wide use of item response theory (IRT) in many testing programs, however, the existing literature offers little insight into the effectiveness of various approaches to handling missing responses in the context of scale linking. Scale linking is commonly used in large-scale assessments to maintain scale comparability over multiple forms of a test. Under a common-item nonequivalent group design (CINEG), missing data that occur to common items potentially influence the linking coefficients and, consequently, may affect scale comparability, test validity, and reliability. The objective of this study was to evaluate the effect of six missing data handling approaches, including listwise deletion (LWD), treating missing data as incorrect responses (IN), corrected item mean imputation (CM), imputing with a response function (RF), multiple imputation (MI), and full information likelihood information (FIML), on IRT scale linking accuracy when missing data occur to common items. Under a set of simulation conditions, the relative performance of the six missing data treatment methods under two missing mechanisms was explored. Results showed that RF, MI, and FIML produced less errors for conducting scale linking whereas LWD was associated with the most errors regardless of various testing conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638981/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49647903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why Do Regular and Reversed Items Load on Separate Factors? Response Difficulty vs. Item Extremity.","authors":"Chester Chun Seng Kam","doi":"10.1177/00131644221143972","DOIUrl":"10.1177/00131644221143972","url":null,"abstract":"<p><p>When constructing measurement scales, regular and reversed items are often used (e.g., \"I am satisfied with my job\"/\"I am not satisfied with my job\"). Some methodologists recommend excluding reversed items because they are more difficult to understand and therefore engender a second, artificial factor distinct from the regular-item factor. The current study compares two explanations for why a construct's dimensionality may become distorted: response difficulty and item extremity. Two types of reversed items were created: negation items (\"The conditions of my life are not good\") and polar opposites (\"The conditions of my life are bad\"), with the former type having higher response difficulty. When extreme wording was used (e.g., \"excellent/terrible\" instead of \"good/bad\"), negation items did not load on a factor distinct from regular items, but polar opposites did. Results thus support item extremity over response difficulty as an explanation for dimensionality distortion. Given that scale developers seldom check for extremity, it is unsurprising that regular and polar opposite items often load on distinct factors.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638982/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42489941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Modeling Missing Data in Structural Investigations Based on Tetrachoric Correlations With Free and Fixed Factor Loadings.","authors":"Karl Schweizer, Andreas Gold, Dorothea Krampen","doi":"10.1177/00131644221143145","DOIUrl":"10.1177/00131644221143145","url":null,"abstract":"<p><p>In modeling missing data, the missing data latent variable of the confirmatory factor model accounts for systematic variation associated with missing data so that replacement of what is missing is not required. This study aimed at extending the modeling missing data approach to tetrachoric correlations as input and at exploring the consequences of switching between models with free and fixed factor loadings. In a simulation study, confirmatory factor analysis (CFA) models with and without a missing data latent variable were used for investigating the structure of data with and without missing data. In addition, the numbers of columns of data sets with missing data and the amount of missing data were varied. The root mean square error of approximation (RMSEA) results revealed that an additional missing data latent variable recovered the degree-of-model fit characterizing complete data when tetrachoric correlations served as input while comparative fit index (CFI) results showed overestimation of this degree-of-model fit. Whereas the results for fixed factor loadings were in line with the assumptions of modeling missing data, the other results showed only partial agreement. Therefore, modeling missing data with fixed factor loadings is recommended.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638985/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47544581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Note on Statistical Hypothesis Testing: Probabilifying <i>Modus Tollens</i> Invalidates Its Force? Not True!","authors":"Keith F Widaman","doi":"10.1177/00131644221145132","DOIUrl":"10.1177/00131644221145132","url":null,"abstract":"<p><p>The import or force of the result of a statistical test has long been portrayed as consistent with deductive reasoning. The simplest form of deductive argument has a first premise with conditional form, such as <i>p</i>→<i>q</i>, which means that \"if <i>p</i> is true, then <i>q</i> must be true.\" Given the first premise, one can either affirm or deny the antecedent clause (<i>p</i>) or affirm or deny the consequent claim (<i>q</i>). This leads to four forms of deductive argument, two of which are valid forms of reasoning and two of which are invalid. The typical conclusion is that only a single form of argument-denying the consequent, also known as <i>modus tollens</i>-is a reasonable analog of decisions based on statistical hypothesis testing. Now, statistical evidence is never certain, but is associated with a probability (i.e., a <i>p</i>-level). Some have argued that <i>modus tollens</i>, when probabilified, loses its force and leads to ridiculous, nonsensical conclusions. Their argument is based on specious problem setup. This note is intended to correct this error and restore the position of <i>modus tollens</i> as a valid form of deductive inference in statistical matters, even when it is probabilified.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10638983/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43119306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philippe Goldammer, Peter Lucas Stöckli, Yannik Andrea Escher, Hubert Annen, Klaus Jonas
{"title":"On the Utility of Indirect Methods for Detecting Faking","authors":"Philippe Goldammer, Peter Lucas Stöckli, Yannik Andrea Escher, Hubert Annen, Klaus Jonas","doi":"10.1177/00131644231209520","DOIUrl":"https://doi.org/10.1177/00131644231209520","url":null,"abstract":"Indirect indices for faking detection in questionnaires make use of a respondent’s deviant or unlikely response pattern over the course of the questionnaire to identify them as a faker. Compared with established direct faking indices (i.e., lying and social desirability scales), indirect indices have at least two advantages: First, they cannot be detected by the test taker. Second, their usage does not require changes to the questionnaire. In the last decades, several such indirect indices have been proposed. However, at present, the researcher’s choice between different indirect faking detection indices is guided by relatively little information, especially if conceptually different indices are to be used together. Thus, we examined and compared how well indices of a representative selection of 12 conceptionally different indirect indices perform and how well they perform individually and jointly compared with an established direct faking measure or validity scale. We found that, first, the score on the agreement factor of the Likert-type item response process tree model, the proportion of desirable scale endpoint responses, and the covariance index were the best-performing indirect indices. Second, using indirect indices in combination resulted in comparable and in some cases even better detection rates than when using direct faking measures. Third, some effective indirect indices were only minimally correlated with substantive scales and could therefore be used to partial faking variance from response sets without losing substance. We, therefore, encourage researchers to use indirect indices instead of direct faking measures when they aim to detect faking in their data.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Heterogeneity in Response Strategies: A Mixture Multidimensional IRTree Approach","authors":"Ö. Emre C. Alagöz, Thorsten Meiser","doi":"10.1177/00131644231206765","DOIUrl":"https://doi.org/10.1177/00131644231206765","url":null,"abstract":"To improve the validity of self-report measures, researchers should control for response style (RS) effects, which can be achieved with IRTree models. A traditional IRTree model considers a response as a combination of distinct decision-making processes, where the substantive trait affects the decision on response direction, while decisions about choosing the middle category or extreme categories are largely determined by midpoint RS (MRS) and extreme RS (ERS). One limitation of traditional IRTree models is the assumption that all respondents utilize the same set of RS in their response strategies, whereas it can be assumed that the nature and the strength of RS effects can differ between individuals. To address this limitation, we propose a mixture multidimensional IRTree (MM-IRTree) model that detects heterogeneity in response strategies. The MM-IRTree model comprises four latent classes of respondents, each associated with a different set of RS traits in addition to the substantive trait. More specifically, the class-specific response strategies involve (1) only ERS in the “ERS only” class, (2) only MRS in the “MRS only” class, (3) both ERS and MRS in the “2RS” class, and (4) neither ERS nor MRS in the “0RS” class. In a simulation study, we showed that the MM-IRTree model performed well in recovering model parameters and class memberships, whereas the traditional IRTree approach showed poor performance if the population includes a mixture of response strategies. In an application to empirical data, the MM-IRTree model revealed distinct classes with noticeable class sizes, suggesting that respondents indeed utilize different response strategies.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135242059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}