{"title":"Detecting Examinees With Item Preknowledge on Real Data.","authors":"Dmitry I Belov, Sarah L Toton","doi":"10.1177/01466216221084202","DOIUrl":"https://doi.org/10.1177/01466216221084202","url":null,"abstract":"<p><p>Recently, Belov & Wollack (2021) developed a method for detecting groups of colluding examinees as cliques in a graph. The objective of this article is to study how the performance of their method on real data with item preknowledge (IP) depends on the mechanism of edge formation governed by a response similarity index (RSI). This study resulted in the development of three new RSIs and demonstrated a remarkable advantage of combining responses and response times for detecting examinees with IP. Possible extensions of this study and recommendations for practitioners were formulated.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9118928/pdf/10.1177_01466216221084202.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9609916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Modern and Popular Approaches to Calculating Reliability for Dichotomously Scored Items.","authors":"Sébastien Béland, Carl F Falk","doi":"10.1177/01466216221084210","DOIUrl":"10.1177/01466216221084210","url":null,"abstract":"<p><p>Recent work on reliability coefficients has largely focused on continuous items, including critiques of Cronbach's alpha. Although two new model-based reliability coefficients have been proposed for dichotomous items (Dimitrov, 2003a,b; Green & Yang, 2009a), these approaches have yet to be compared to each other or other popular estimates of reliability such as omega, alpha, and the greatest lower bound. We seek computational improvements to one of these model-based reliability coefficients and, in addition, conduct initial Monte Carlo simulations to compare coefficients using dichotomous data. Our results suggest that such improvements to the model-based approach are warranted, while model-based approaches were generally superior.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9118929/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41659739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Potential for Interpretational Confounding in Cognitive Diagnosis Models.","authors":"Qi Helen Huang, Daniel M Bolt","doi":"10.1177/01466216221084207","DOIUrl":"10.1177/01466216221084207","url":null,"abstract":"<p><p>Binary examinee mastery/nonmastery classifications in cognitive diagnosis models may often be an approximation to proficiencies that are better regarded as continuous. Such misspecification can lead to inconsistencies in the operational definition of \"mastery\" when binary skills models are assumed. In this paper we demonstrate the potential for an interpretational confounding of the latent skills when truly continuous skills are treated as binary. Using the DINA model as an example, we show how such forms of confounding can be observed through item and/or examinee parameter change when (1) different collections of items (such as representing different test forms) previously calibrated separately are subsequently calibrated together; and (2) when structural restrictions are placed on the relationships among skill attributes (such as the assumption of strictly nonnegative growth over time), among other possibilities. We examine these occurrences in both simulation and real data studies. It is suggested that researchers should regularly attend to the potential for interpretational confounding by studying differences in attribute mastery proportions and/or changes in item parameter (e.g., slip and guess) estimates attributable to skill continuity when the same samples of examinees are administered different test forms, or the same test forms are involved in different calibrations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9118932/pdf/10.1177_01466216221084207.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9609918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William F Christensen, Melanie M Wall, Irini Moustaki
{"title":"Assessing Dimensionality in Dichotomous Items When Many Subjects Have All-Zero Responses: An Example From Psychiatry and a Solution Using Mixture Models.","authors":"William F Christensen, Melanie M Wall, Irini Moustaki","doi":"10.1177/01466216211066602","DOIUrl":"https://doi.org/10.1177/01466216211066602","url":null,"abstract":"<p><p>Common methods for determining the number of latent dimensions underlying an item set include eigenvalue analysis and examination of fit statistics for factor analysis models with varying number of factors. Given a set of dichotomous items, the authors demonstrate that these empirical assessments of dimensionality often incorrectly estimate the number of dimensions when there is a preponderance of individuals in the sample with all-zeros as their responses, for example, not endorsing any symptoms on a health battery. Simulated data experiments are conducted to demonstrate when each of several common diagnostics of dimensionality can be expected to under- or over-estimate the true dimensionality of the underlying latent variable. An example is shown from psychiatry assessing the dimensionality of a social anxiety disorder battery where 1, 2, 3, or more factors are identified, depending on the method of dimensionality assessment. An all-zero inflated exploratory factor analysis model (AZ-EFA) is introduced for assessing the dimensionality of the underlying subgroup corresponding to those possessing the measurable trait. The AZ-EFA approach is demonstrated using simulation experiments and an example measuring social anxiety disorder from a large nationally representative survey. Implications of the findings are discussed, in particular, regarding the potential for different findings in community versus patient populations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9073639/pdf/10.1177_01466216211066602.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9748243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing the Misclassification Costs of Cognitive Diagnosis Computerized Adaptive Testing: Item Selection With Minimum Expected Risk.","authors":"Chia-Ling Hsu, Wen-Chung Wang","doi":"10.1177/01466216211066610","DOIUrl":"https://doi.org/10.1177/01466216211066610","url":null,"abstract":"<p><p>Cognitive diagnosis computerized adaptive testing (CD-CAT) aims to identify each examinee's strengths and weaknesses on latent attributes for appropriate classification into an attribute profile. As the cost of a CD-CAT misclassification differs across user needs (e.g., remedial program vs. scholarship eligibilities), item selection can incorporate such costs to improve measurement efficiency. This study proposes such a method, <i>minimum expected risk</i> (MER), based on Bayesian decision theory. According to simulations, using MER to identify examinees with no mastery (MER-U0) or full mastery (MER-U1) showed greater classification accuracy and efficiency than other methods for these attribute profiles, especially for shorter tests or low quality item banks. For other attribute profiles, regardless of item quality or termination criterion, MER methods, modified posterior-weighted Kullback-Leibler information (MPWKL), posterior-weighted CDM discrimination index (PWCDI), and Shannon entropy (SHE) performed similarly and outperformed posterior-weighted attribute-level CDM discrimination index (PWACDI) in classification accuracy and test efficiency, especially on short tests. MER with a zero-one loss function, MER-U0, MER-U1, and PWACDI utilized item banks more effectively than the other methods. Overall, these results show the feasibility of using MER in CD-CAT to increase the accuracy for specific attribute profiles to address different user needs.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9073635/pdf/10.1177_01466216211066610.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9748238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Robust Likelihood Estimators to Mitigate Bias From Rapid Guessing.","authors":"Joseph A Rios","doi":"10.1177/01466216221084371","DOIUrl":"https://doi.org/10.1177/01466216221084371","url":null,"abstract":"<p><p>Rapid guessing (RG) behavior can undermine measurement property and score-based inferences. To mitigate this potential bias, practitioners have relied on response time information to identify and filter RG responses. However, response times may be unavailable in many testing contexts, such as paper-and-pencil administrations. When this is the case, self-report measures of effort and person-fit statistics have been used. These methods are limited in that inferences concerning motivation and aberrant responding are made at the examinee level. As test takers can engage in a mixture of solution and RG behavior throughout a test administration, there is a need to limit the influence of potential aberrant responses at the item level. This can be done by employing robust estimation procedures. Since these estimators have received limited attention in the RG literature, the objective of this simulation study was to evaluate ability parameter estimation accuracy in the presence of RG by comparing maximum likelihood estimation (MLE) to two robust variants, the bisquare and Huber estimators. Two RG conditions were manipulated, RG percentage (10%, 20%, and 40%) and pattern (difficulty-based and changing state). Contrasted to the MLE procedure, results demonstrated that both the bisquare and Huber estimators reduced bias in ability parameter estimates by as much as 94%. Given that the Huber estimator showed smaller standard deviations of error and performed equally as well as the bisquare approach under most conditions, it is recommended as a promising approach to mitigating bias from RG when response time information is unavailable.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9073634/pdf/10.1177_01466216221084371.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9748240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BayMDS: An R Package for Bayesian Multidimensional Scaling and Choice of Dimension.","authors":"Man-Suk Oh, Eun-Kyung Lee","doi":"10.1177/01466216221084219","DOIUrl":"https://doi.org/10.1177/01466216221084219","url":null,"abstract":"MDSIC computes and plots MDSIC that can be used to select optimal number of dimensions for a given data set. There are also a few plot functions. plotObj shows pairwise scatter plots of object con fi guration in a Euclidean space for the fi rst three dimensions. plotTrace provides trace plots of parameter samples for visual inspection of MCMC convergence. plotDelDist plots the observed dissimilarity measures versus Euclidean distances computed from BMDS object con fi guration. bayMDSApp shows the results of bayMDS in a web-based GUI (graphical user","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9073637/pdf/10.1177_01466216221084219.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9748237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bolsinova, Benjamin E. Deonovic, Meirav Arieli-Attali, Burr Settles, Masato Hagiwara, G. Maris
{"title":"Measurement of Ability in Adaptive Learning and Assessment Systems when Learners Use On-Demand Hints","authors":"M. Bolsinova, Benjamin E. Deonovic, Meirav Arieli-Attali, Burr Settles, Masato Hagiwara, G. Maris","doi":"10.1177/01466216221084208","DOIUrl":"https://doi.org/10.1177/01466216221084208","url":null,"abstract":"Adaptive learning and assessment systems support learners in acquiring knowledge and skills in a particular domain. The learners’ progress is monitored through them solving items matching their level and aiming at specific learning goals. Scaffolding and providing learners with hints are powerful tools in helping the learning process. One way of introducing hints is to make hint use the choice of the student. When the learner is certain of their response, they answer without hints, but if the learner is not certain or does not know how to approach the item they can request a hint. We develop measurement models for applications where such on-demand hints are available. Such models take into account that hint use may be informative of ability, but at the same time may be influenced by other individual characteristics. Two modeling strategies are considered: (1) The measurement model is based on a scoring rule for ability which includes both response accuracy and hint use. (2) The choice to use hints and response accuracy conditional on this choice are modeled jointly using Item Response Tree models. The properties of different models and their implications are discussed. An application to data from Duolingo, an adaptive language learning system, is presented. Here, the best model is the scoring-rule-based model with full credit for correct responses without hints, partial credit for correct responses with hints, and no credit for all incorrect responses. The second dimension in the model accounts for the individual differences in the tendency to use hints.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43517867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of Sampling Variability When Estimating the Explained Common Variance","authors":"Björn Andersson, Hao Luo","doi":"10.1177/01466216221084215","DOIUrl":"https://doi.org/10.1177/01466216221084215","url":null,"abstract":"Assessing multidimensionality of a scale or test is a staple of educational and psychological measurement. One approach to evaluate approximate unidimensionality is to fit a bifactor model where the subfactors are determined by substantive theory and estimate the explained common variance (ECV) of the general factor. The ECV says to what extent the explained variance is dominated by the general factor over the specific factors, and has been used, together with other methods and statistics, to determine if a single factor model is sufficient for analyzing a scale or test (Rodriguez et al., 2016). In addition, the individual item-ECV (I-ECV) has been used to assess approximate unidimensionality of individual items (Carnovale et al., 2021; Stucky et al., 2013). However, the ECVand I-ECVare subject to random estimation error which previous studies have not considered. Not accounting for the error in estimation can lead to conclusions regarding the dimensionality of a scale or item that are inaccurate, especially when an estimate of ECVor I-ECV is compared to a pre-specified cut-off value to evaluate unidimensionality. The objective of the present study is to derive standard errors of the estimators of ECV and I-ECV with linear confirmatory factor analysis (CFA) models to enable the assessment of random estimation error and the computation of confidence intervals for the parameters. We use Monte-Carlo simulation to assess the accuracy of the derived standard errors and evaluate the impact of sampling variability on the estimation of the ECV and I-ECV. In a bifactor model for J items, denote Xj, j 1⁄4 1, ..., J , as the observed variable and let G denote the general factor. We define the S subfactors Fs, s2f1,..., Sg, and Js as the set of indicators for each subfactor. Each observed indicator Xj is then defined by the multiple factor model (McDonald, 2013)","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42137052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Standard Errors of Kernel Equating: Accounting for Bandwidth Estimation","authors":"Kseniia Marcq, Björn Andersson","doi":"10.1177/01466216211066601","DOIUrl":"https://doi.org/10.1177/01466216211066601","url":null,"abstract":"In standardized testing, equating is used to ensure comparability of test scores across multiple test administrations. One equipercentile observed-score equating method is kernel equating, where an essential step is to obtain continuous approximations to the discrete score distributions by applying a kernel with a smoothing bandwidth parameter. When estimating the bandwidth, additional variability is introduced which is currently not accounted for when calculating the standard errors of equating. This poses a threat to the accuracy of the standard errors of equating. In this study, the asymptotic variance of the bandwidth parameter estimator is derived and a modified method for calculating the standard error of equating that accounts for the bandwidth estimation variability is introduced for the equivalent groups design. A simulation study is used to verify the derivations and confirm the accuracy of the modified method across several sample sizes and test lengths as compared to the existing method and the Monte Carlo standard error of equating estimates. The results show that the modified standard errors of equating are accurate under the considered conditions. Furthermore, the modified and the existing methods produce similar results which suggest that the bandwidth variability impact on the standard error of equating is minimal.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49283258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}