Yongze Xu, Ying Cui, Xinyi Wang, Meiwei Huang, Fang Luo
{"title":"Confidence Screening Detector: A New Method for Detecting Test Collusion.","authors":"Yongze Xu, Ying Cui, Xinyi Wang, Meiwei Huang, Fang Luo","doi":"10.1177/01466216231165299","DOIUrl":"10.1177/01466216231165299","url":null,"abstract":"<p><p>Test collusion (TC) is a form of cheating in which, examinees operate in groups to alter normal item responses. TC is becoming increasingly common, especially within high-stakes, large-scale examinations. However, research on TC detection methods remains scarce. The present article proposes a new algorithm for TC detection, inspired by variable selection within high-dimensional statistical analysis. The algorithm relies only on item responses and supports different response similarity indices. Simulation and practical studies were conducted to (1) compare the performance of the new algorithm against the recently developed clique detector approach, and (2) verify the performance of the new algorithm in a large-scale test setting.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9363896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Computerized Adaptive Testing with Batteries of Unidimensional Tests.","authors":"Pasquale Anselmi, Egidio Robusto, Francesca Cristante","doi":"10.1177/01466216231165301","DOIUrl":"10.1177/01466216231165301","url":null,"abstract":"<p><p>The article presents a new computerized adaptive testing (CAT) procedure for use with batteries of unidimensional tests. At each step of testing, the estimate of a certain ability is updated on the basis of the response to the latest administered item and the current estimates of all other abilities measured by the battery. The information deriving from these abilities is incorporated into an empirical prior that is updated each time that new estimates of the abilities are computed. In two simulation studies, the performance of the proposed procedure is compared with that of a standard procedure for CAT with batteries of unidimensional tests. The proposed procedure yields more accurate ability estimates in fixed-length CATs, and a reduction of test length in variable-length CATs. These gains in accuracy and efficiency increase with the correlation between the abilities measured by the batteries.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126386/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9357115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Likelihood Approach to Item Response Theory Equating of Multiple Forms.","authors":"Michela Battauz, Waldir Leôncio","doi":"10.1177/01466216231151702","DOIUrl":"10.1177/01466216231151702","url":null,"abstract":"<p><p>Test equating is a statistical procedure to make scores from different test forms comparable and interchangeable. Focusing on an IRT approach, this paper proposes a novel method that simultaneously links the item parameter estimates of a large number of test forms. Our proposal differentiates itself from the current state of the art by using likelihood-based methods and by taking into account the heteroskedasticity and the correlation of the item parameter estimates of each form. Simulation studies show that our proposal yields equating coefficient estimates which are more efficient than what is currently available in the literature.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9357110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparison of Confirmatory Factor Analysis and Network Models for Measurement Invariance Assessment When Indicator Residuals are Correlated.","authors":"W Holmes Finch, Brian F French, Alicia Hazelwood","doi":"10.1177/01466216231151700","DOIUrl":"10.1177/01466216231151700","url":null,"abstract":"<p><p>Social science research is heavily dependent on the use of standardized assessments of a variety of phenomena, such as mood, executive functioning, and cognitive ability. An important assumption when using these instruments is that they perform similarly for all members of the population. When this assumption is violated, the validity evidence of the scores is called into question. The standard approach for assessing the factorial invariance of the measures across subgroups within the population involves multiple groups confirmatory factor analysis (MGCFA). CFA models typically, but not always, assume that once the latent structure of the model is accounted for, the residual terms for the observed indicators are uncorrelated (local independence). Commonly, correlated residuals are introduced after a baseline model shows inadequate fit and inspection of modification indices ensues to remedy fit. An alternative procedure for fitting latent variable models that may be useful when local independence does not hold is based on network models. In particular, the residual network model (RNM) offers promise with respect to fitting latent variable models in the absence of local independence via an alternative search procedure. This simulation study compared the performances of MGCFA and RNM for measurement invariance assessment when local independence is violated, and residual covariances are themselves not invariant. Results revealed that RNM had better Type I error control and higher power compared to MGCFA when local independence was absent. Implications of the results for statistical practice are discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10845586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Effects of Rating Designs on Rater Classification Accuracy and Rater Measurement Precision in Large-Scale Mixed-Format Assessments.","authors":"Wenjing Guo, Stefanie A Wind","doi":"10.1177/01466216231151705","DOIUrl":"10.1177/01466216231151705","url":null,"abstract":"<p><p>In standalone performance assessments, researchers have explored the influence of different rating designs on the sensitivity of latent trait model indicators to different rater effects as well as the impacts of different rating designs on student achievement estimates. However, the literature provides little guidance on the degree to which different rating designs might affect rater classification accuracy (severe/lenient) and rater measurement precision in both standalone performance assessments and mixed-format assessments. Using results from an analysis of National Assessment of Educational Progress (NAEP) data, we conducted simulation studies to systematically explore the impacts of different rating designs on rater measurement precision and rater classification accuracy (severe/lenient) in mixed-format assessments. The results suggest that the complete rating design produced the highest rater classification accuracy and greatest rater measurement precision, followed by the multiple-choice (MC) + spiral link design and the MC link design. Considering that complete rating designs are not practical in most testing situations, the MC + spiral link design may be a useful choice because it balances cost and performance. We consider the implications of our findings for research and practice.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979195/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10846015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.","authors":"Waldir Leôncio, Marie Wiberg, Michela Battauz","doi":"10.1177/01466216221124087","DOIUrl":"10.1177/01466216221124087","url":null,"abstract":"<p><p>Test equating is a statistical procedure to ensure that scores from different test forms can be used interchangeably. There are several methodologies available to perform equating, some of which are based on the Classical Test Theory (CTT) framework and others are based on the Item Response Theory (IRT) framework. This article compares equating transformations originated from three different frameworks, namely IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE). The comparisons were made under different data-generating scenarios, which include the development of a novel data-generation procedure that allows the simulation of test data without relying on IRT parameters while still providing control over some test score properties such as distribution skewness and item difficulty. Our results suggest that IRT methods tend to provide better results than KE even when the data are not generated from IRT processes. KE might be able to provide satisfactory results if a proper pre-smoothing solution can be found, while also being much faster than IRT methods. For daily applications, we recommend observing the sensibility of the results to the equating method, minding the importance of good model fit and meeting the assumptions of the framework.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/74/30/10.1177_01466216221124087.PMC9979196.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10846018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heywood Cases in Unidimensional Factor Models and Item Response Models for Binary Data.","authors":"Selena Wang, Paul De Boeck, Marcel Yotebieng","doi":"10.1177/01466216231151701","DOIUrl":"10.1177/01466216231151701","url":null,"abstract":"<p><p>Heywood cases are known from linear factor analysis literature as variables with communalities larger than 1.00, and in present day factor models, the problem also shows in negative residual variances. For binary data, factor models for ordinal data can be applied with either delta parameterization or theta parametrization. The former is more common than the latter and can yield Heywood cases when limited information estimation is used. The same problem shows up as non convergence cases in theta parameterized factor models and as extremely large discriminations in item response theory (IRT) models. In this study, we explain why the same problem appears in different forms depending on the method of analysis. We first discuss this issue using equations and then illustrate our conclusions using a small simulation study, where all three methods, delta and theta parameterized ordinal factor models (with estimation based on polychoric correlations and thresholds) and an IRT model (with full information estimation), are used to analyze the same datasets. The results generalize across WLS, WLSMV, and ULS estimators for the factor models for ordinal data. Finally, we analyze real data with the same three approaches. The results of the simulation study and the analysis of real data confirm the theoretical conclusions.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979198/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10846019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sandip Sinharay, Matthew S Johnson, Wei Wang, Jing Miao
{"title":"Targeted Double Scoring of Performance Tasks Using a Decision-Theoretic Approach.","authors":"Sandip Sinharay, Matthew S Johnson, Wei Wang, Jing Miao","doi":"10.1177/01466216221129271","DOIUrl":"10.1177/01466216221129271","url":null,"abstract":"<p><p>Targeted double scoring, or, double scoring of only some (but not all) responses, is used to reduce the burden of scoring performance tasks for several mastery tests (Finkelman, Darby, & Nering, 2008). An approach based on statistical decision theory (e.g., Berger, 1989; Ferguson, 1967; Rudner, 2009) is suggested to evaluate and potentially improve upon the existing strategies in targeted double scoring for mastery tests. An application of the approach to data from an operational mastery test shows that a refinement of the currently used strategy would lead to substantial cost savings.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9393345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niek Frans, Johan Braeken, Bernard P Veldkamp, Muirne C S Paap
{"title":"Empirical Priors in Polytomous Computerized Adaptive Tests: Risks and Rewards in Clinical Settings.","authors":"Niek Frans, Johan Braeken, Bernard P Veldkamp, Muirne C S Paap","doi":"10.1177/01466216221124091","DOIUrl":"https://doi.org/10.1177/01466216221124091","url":null,"abstract":"<p><p>The use of empirical prior information about participants has been shown to substantially improve the efficiency of computerized adaptive tests (CATs) in educational settings. However, it is unclear how these results translate to clinical settings, where small item banks with highly informative polytomous items often lead to very short CATs. We explored the risks and rewards of using prior information in CAT in two simulation studies, rooted in applied clinical examples. In the first simulation, prior precision and bias in the prior location were manipulated independently. Our results show that a precise personalized prior can meaningfully increase CAT efficiency. However, this reward comes with the potential risk of overconfidence in wrong empirical information (i.e., using a precise severely biased prior), which can lead to unnecessarily long tests, or severely biased estimates. The latter risk can be mitigated by setting a minimum number of items that are to be administered during the CAT, or by setting a less precise prior; be it at the expense of canceling out any efficiency gains. The second simulation, with more realistic bias and precision combinations in the empirical prior, places the prevalence of the potential risks in context. With similar estimation bias, an empirical prior reduced CAT test length, compared to a standard normal prior, in 68% of cases, by a median of 20%; while test length increased in only 3% of cases. The use of prior information in CAT seems to be a feasible and simple method to reduce test burden for patients and clinical practitioners alike.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/57/79/10.1177_01466216221124091.PMC9679926.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuangzhuang Han, Sandip Sinharay, Matthew S Johnson, Xiang Liu
{"title":"The Standardized S-<i>X</i> <sup>2</sup> Statistic for Assessing Item Fit.","authors":"Zhuangzhuang Han, Sandip Sinharay, Matthew S Johnson, Xiang Liu","doi":"10.1177/01466216221108077","DOIUrl":"10.1177/01466216221108077","url":null,"abstract":"<p><p>The S-<i>X</i> <sup>2</sup> statistic (Orlando & Thissen, 2000) is popular among researchers and practitioners who are interested in the assessment of item fit. However, the statistic suffers from the Chernoff-Lehmann problem (Chernoff & Lehmann, 1954) and hence does not have a known asymptotic null distribution. This paper suggests a modified version of the S-<i>X</i> <sup>2</sup> statistic that is based on the modified Rao-Robson <i>χ</i> <sup>2</sup> statistic (Rao & Robson, 1974). A simulation study and a real data analyses demonstrate that the use of the modified statistic instead of the S-<i>X</i> <sup>2</sup> statistic would lead to fewer items being flagged for misfit.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}