{"title":"Evaluating the Construct Validity of Instructional Manipulation Checks as Measures of Careless Responding to Surveys.","authors":"Mark C Ramsey, Nathan A Bowling, Preston S Menke","doi":"10.1177/01466216241284293","DOIUrl":"10.1177/01466216241284293","url":null,"abstract":"<p><p>Careless responding measures are important for several purposes, whether it's screening for careless responding or for research centered on careless responding as a substantive variable. One such approach for assessing carelessness in surveys is the use of an instructional manipulation check. Despite its apparent popularity, little is known about the construct validity of instructional manipulation checks as measures of careless responding. Initial results are inconclusive, and no study has thoroughly evaluated the validity of the instructional manipulation check as a measure of careless responding. Across 2 samples (<i>N</i> = 762), we evaluated the construct validity of the instructional manipulation check under a nomological network. We found that the instructional manipulation check converged poorly with other measures of careless responding, weakly predicted participant inability to recognize study content, and did not display incremental validity over existing measures of careless responding. Additional analyses revealed that instructional manipulation checks performed poorly compared to single scores of other alternative careless responding measures and that screening data with alternative measures of careless responding produced greater or similar gains in data quality to instructional manipulation checks. Based on the results of our studies, we do not recommend using instructional manipulation checks to assess or screen for careless responding to surveys.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142510499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Test-Retest Reliability in the Presence of Self-Selection Bias and Learning/Practice Effects.","authors":"William C M Belzak, J R Lockwood","doi":"10.1177/01466216241284585","DOIUrl":"https://doi.org/10.1177/01466216241284585","url":null,"abstract":"<p><p>Test-retest reliability is often estimated using naturally occurring data from test repeaters. In settings such as admissions testing, test takers choose if and when to retake an assessment. This self-selection can bias estimates of test-retest reliability because individuals who choose to retest are typically unrepresentative of the broader testing population and because differences among test takers in learning or practice effects may increase with time between test administrations. We develop a set of methods for estimating test-retest reliability from observational data that can mitigate these sources of bias, which include sample weighting, polynomial regression, and Bayesian model averaging. We demonstrate the value of using these methods for reducing bias and improving precision of estimated reliability using empirical and simulated data, both of which are based on more than 40,000 repeaters of a high-stakes English language proficiency test. Finally, these methods generalize to settings in which only a single, error-prone measurement is taken repeatedly over time and where self-selection and/or changes to the underlying construct may be at play.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142569674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mark-Recapture Approach to Estimating Item Pool Compromise.","authors":"Richard A Feinberg","doi":"10.1177/01466216241284410","DOIUrl":"10.1177/01466216241284410","url":null,"abstract":"<p><p>Testing organizations routinely investigate if secure exam material has been compromised and is consequently invalid for scoring and inclusion on future assessments. Beyond identifying individual compromised items, knowing the degree to which a form is compromised can inform decisions on whether the form can no longer be administered or when an item pool is compromised to such an extent that serious action on a broad scale must be taken to ensure the validity of score interpretations. Previous research on estimating the population of item compromise is sparse; however, this is a more generally long-studied problem in ecological research. In this note, we exemplify the utility of the mark-recapture technique to estimate the population of compromised items, first through a brief demonstration to introduce the fundamental concepts and then a more realistic scenario to illustrate applicability to large-scale testing programs. An effective use of this technique would be to longitudinally track changes in the estimated population to inform operational test security strategies. Many variations on mark-recapture exist and interpretation of the estimated population depends on several factors. Thus, this note is only meant to introduce the concept of mark-recapture as a useful application to evaluate a testing organization's compromise mitigation procedures.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142569673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of Differential Item Functioning on Computer Adaptive Testing Under Different Conditions.","authors":"Merve Sahin Kursad, Seher Yalcin","doi":"10.1177/01466216241284295","DOIUrl":"10.1177/01466216241284295","url":null,"abstract":"<p><p>This study provides an overview of the effect of differential item functioning (DIF) on measurement precision, test information function (TIF), and test effectiveness in computer adaptive tests (CATs). Simulated data for the study was produced and analyzed with the Rstudio. During the data generation process, item pool size, DIF type, DIF percentage, item selection method for CAT, and the test termination rules were considered changed conditions. Sample size and ability parameter distribution, Item Response Theory (IRT) model, DIF size, ability estimation method, test starting rule, and item usage frequency method regarding CAT conditions were considered fixed conditions. To examine the effect of DIF, measurement precision, TIF and test effectiveness were calculated. Results show DIF has negative effects on measurement precision, TIF, and test effectiveness. In particular, statistically significant effects of the percentage DIF items and DIF type are observed on measurement precision.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501093/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142510498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Item Response Modeling of Clinical Instruments With Filter Questions: Disentangling Symptom Presence and Severity.","authors":"Brooke E Magnus","doi":"10.1177/01466216241261709","DOIUrl":"10.1177/01466216241261709","url":null,"abstract":"<p><p>Clinical instruments that use a filter/follow-up response format often produce data with excess zeros, especially when administered to nonclinical samples. When the unidimensional graded response model (GRM) is then fit to these data, parameter estimates and scale scores tend to suggest that the instrument measures individual differences only among individuals with severe levels of the psychopathology. In such scenarios, alternative item response models that explicitly account for excess zeros may be more appropriate. The multivariate hurdle graded response model (MH-GRM), which has been previously proposed for handling zero-inflated questionnaire data, includes two latent variables: susceptibility, which underlies responses to the filter question, and severity, which underlies responses to the follow-up question. Using both simulated and empirical data, the current research shows that compared to unidimensional GRMs, the MH-GRM is better able to capture individual differences across a wider range of psychopathology, and that when unidimensional GRMs are fit to data from questionnaires that include filter questions, individual differences at the lower end of the severity continuum largely go unmeasured. Practical implications are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11331747/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142009739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Note on Standard Errors for Multidimensional Two-Parameter Logistic Models Using Gaussian Variational Estimation","authors":"Jiaying Xiao, Chun Wang, Gongjun Xu","doi":"10.1177/01466216241265757","DOIUrl":"https://doi.org/10.1177/01466216241265757","url":null,"abstract":"Accurate item parameters and standard errors (SEs) are crucial for many multidimensional item response theory (MIRT) applications. A recent study proposed the Gaussian Variational Expectation Maximization (GVEM) algorithm to improve computational efficiency and estimation accuracy ( Cho et al., 2021 ). However, the SE estimation procedure has yet to be fully addressed. To tackle this issue, the present study proposed an updated supplemented expectation maximization (USEM) method and a bootstrap method for SE estimation. These two methods were compared in terms of SE recovery accuracy. The simulation results demonstrated that the GVEM algorithm with bootstrap and item priors (GVEM-BSP) outperformed the other methods, exhibiting less bias and relative bias for SE estimates under most conditions. Although the GVEM with USEM (GVEM-USEM) was the most computationally efficient method, it yielded an upward bias for SE estimates.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141809630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measurement Invariance Testing Works","authors":"J. Lasker","doi":"10.1177/01466216241261708","DOIUrl":"https://doi.org/10.1177/01466216241261708","url":null,"abstract":"Psychometricians have argued that measurement invariance (MI) testing is needed to know if the same psychological constructs are measured in different groups. Data from five experiments allowed that position to be tested. In the first, participants answered questionnaires on belief in free will and either the meaning of life or the meaning of a nonsense concept called “gavagai.” Since the meaning of life and the meaning of gavagai conceptually differ, MI should have been violated when groups were treated like their measurements were identical. MI was severely violated, indicating the questionnaires were interpreted differently. In the second and third experiments, participants were randomized to watch treatment videos explaining figural matrices rules or task-irrelevant control videos. Participants then took intelligence and figural matrices tests. The intervention worked and the experimental group had an additional influence on figural matrix performance in the form of knowing matrix rules, so their performance on the matrices tests violated MI and was anomalously high for their intelligence levels. In both experiments, MI was severely violated. In the fourth and fifth experiments, individuals were exposed to growth mindset interventions that a twin study revealed changed the amount of genetic variance in the target mindset measure without affecting other variables. When comparing treatment and control groups, MI was attainable before but not after treatment. Moreover, the control group showed longitudinal invariance, but the same was untrue for the treatment group. MI testing is likely able to show if the same things are measured in different groups.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141343348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accommodating and Extending Various Models for Special Effects Within the Generalized Partially Confirmatory Factor Analysis Framework","authors":"Yifan Zhang, Jinsong Chen","doi":"10.1177/01466216241261704","DOIUrl":"https://doi.org/10.1177/01466216241261704","url":null,"abstract":"Special measurement effects including the method and testlet effects are common issues in educational and psychological measurement. They are typically covered by various bifactor models or models for the multiple traits multiple methods (MTMM) structure for continuous data and by various testlet effect models for categorical data. However, existing models have some limitations in accommodating different type of effects. With slight modification, the generalized partially confirmatory factor analysis (GPCFA) framework can flexibly accommodate special effects for continuous and categorical cases with added benefits. Various bifactor, MTMM and testlet effect models can be linked to different variants of the revised GPCFA model. Compared to existing approaches, GPCFA offers multidimensionality for both the general and effect factors (or traits) and can address local dependence, mixed-type formats, and missingness jointly. Moreover, the partially confirmatory approach allows for regularization of the loading patterns, resulting in a simpler structure in both the general and special parts. We also provide a subroutine to compute the equivalent effect size. Simulation studies and real-data examples are used to demonstrate the performance and usefulness of the proposed approach under different situations.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141353380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Directional Invariance in an Item Response Tree Model for Extreme Response Style and Trait-Based Unfolding Responses","authors":"Siqi He, Justin L. Kern","doi":"10.1177/01466216241261705","DOIUrl":"https://doi.org/10.1177/01466216241261705","url":null,"abstract":"Item response tree (IRTree) approaches have received increasing attention in the response style literature due to their capability to partial out response style latent traits from content-related latent traits by considering separate decisions for agreement and level of agreement. Additionally, it has shown that the functioning of the intensity of agreement decision may depend upon the agreement decision with an item, so that the item parameters and person parameters may differ by direction of agreement; when the parameters across direction are the same, this is called directional invariance. Furthermore, for non-cognitive psychological constructs, it has been argued that the response process may be best described as following an unfolding process. In this study, a family of IRTree models to handle unfolding responses with the agreement decision following the hyperbolic cosine model and the intensity of agreement decision following a graded response model is investigated. This model family also allows for investigation of item- and person-level directional invariance. A simulation study is conducted to evaluate parameter recovery; model parameters are estimated with a fully Bayesian approach using JAGS (Just Another Gibbs Sampler). The proposed modeling scheme is demonstrated with two data examples with multiple model comparisons allowing for varying levels of directional invariance and unfolding versus dominance processes. An approach to visualizing the final model item response functioning is also developed. The article closes with a short discussion about the results.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141356467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"aberrance: An R Package for Detecting Aberrant Behavior in Test Data","authors":"Kylie Gorney, Jiayi Deng","doi":"10.1177/01466216241261707","DOIUrl":"https://doi.org/10.1177/01466216241261707","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141385802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}