Sandip Sinharay, Matthew S Johnson, Wei Wang, Jing Miao
{"title":"Targeted Double Scoring of Performance Tasks Using a Decision-Theoretic Approach.","authors":"Sandip Sinharay, Matthew S Johnson, Wei Wang, Jing Miao","doi":"10.1177/01466216221129271","DOIUrl":"10.1177/01466216221129271","url":null,"abstract":"<p><p>Targeted double scoring, or, double scoring of only some (but not all) responses, is used to reduce the burden of scoring performance tasks for several mastery tests (Finkelman, Darby, & Nering, 2008). An approach based on statistical decision theory (e.g., Berger, 1989; Ferguson, 1967; Rudner, 2009) is suggested to evaluate and potentially improve upon the existing strategies in targeted double scoring for mastery tests. An application of the approach to data from an operational mastery test shows that a refinement of the currently used strategy would lead to substantial cost savings.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 2","pages":"155-163"},"PeriodicalIF":1.2,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9979197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9393345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Niek Frans, Johan Braeken, Bernard P Veldkamp, Muirne C S Paap
{"title":"Empirical Priors in Polytomous Computerized Adaptive Tests: Risks and Rewards in Clinical Settings.","authors":"Niek Frans, Johan Braeken, Bernard P Veldkamp, Muirne C S Paap","doi":"10.1177/01466216221124091","DOIUrl":"https://doi.org/10.1177/01466216221124091","url":null,"abstract":"<p><p>The use of empirical prior information about participants has been shown to substantially improve the efficiency of computerized adaptive tests (CATs) in educational settings. However, it is unclear how these results translate to clinical settings, where small item banks with highly informative polytomous items often lead to very short CATs. We explored the risks and rewards of using prior information in CAT in two simulation studies, rooted in applied clinical examples. In the first simulation, prior precision and bias in the prior location were manipulated independently. Our results show that a precise personalized prior can meaningfully increase CAT efficiency. However, this reward comes with the potential risk of overconfidence in wrong empirical information (i.e., using a precise severely biased prior), which can lead to unnecessarily long tests, or severely biased estimates. The latter risk can be mitigated by setting a minimum number of items that are to be administered during the CAT, or by setting a less precise prior; be it at the expense of canceling out any efficiency gains. The second simulation, with more realistic bias and precision combinations in the empirical prior, places the prevalence of the potential risks in context. With similar estimation bias, an empirical prior reduced CAT test length, compared to a standard normal prior, in 68% of cases, by a median of 20%; while test length increased in only 3% of cases. The use of prior information in CAT seems to be a feasible and simple method to reduce test burden for patients and clinical practitioners alike.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"48-63"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/57/79/10.1177_01466216221124091.PMC9679926.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuangzhuang Han, Sandip Sinharay, Matthew S Johnson, Xiang Liu
{"title":"The Standardized S-<i>X</i> <sup>2</sup> Statistic for Assessing Item Fit.","authors":"Zhuangzhuang Han, Sandip Sinharay, Matthew S Johnson, Xiang Liu","doi":"10.1177/01466216221108077","DOIUrl":"10.1177/01466216221108077","url":null,"abstract":"<p><p>The S-<i>X</i> <sup>2</sup> statistic (Orlando & Thissen, 2000) is popular among researchers and practitioners who are interested in the assessment of item fit. However, the statistic suffers from the Chernoff-Lehmann problem (Chernoff & Lehmann, 1954) and hence does not have a known asymptotic null distribution. This paper suggests a modified version of the S-<i>X</i> <sup>2</sup> statistic that is based on the modified Rao-Robson <i>χ</i> <sup>2</sup> statistic (Rao & Robson, 1974). A simulation study and a real data analyses demonstrate that the use of the modified statistic instead of the S-<i>X</i> <sup>2</sup> statistic would lead to fewer items being flagged for misfit.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"3-18"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Katherine E Castellano, Sandip Sinharay, Jiangang Hao, Chen Li
{"title":"An Investigation Into the Impact of Test Session Disruptions for At-Home Test Administrations.","authors":"Katherine E Castellano, Sandip Sinharay, Jiangang Hao, Chen Li","doi":"10.1177/01466216221128011","DOIUrl":"10.1177/01466216221128011","url":null,"abstract":"<p><p>In response to the closures of test centers worldwide due to the COVID-19 pandemic, several testing programs offered large-scale standardized assessments to examinees remotely. However, due to the varying quality of the performance of personal devices and internet connections, more at-home examinees likely suffered \"disruptions\" or an interruption in the connectivity to their testing session compared to typical test-center administrations. Disruptions have the potential to adversely affect examinees and lead to fairness or validity issues. The goal of this study was to investigate the extent to which disruptions impacted performance of at-home examinees using data from a large-scale admissions test. Specifically, the study involved comparing the average test scores of the disrupted examinees with those of the non-disrupted examinees after weighting the non-disrupted examinees to resemble the disrupted examinees along baseline characteristics. The results show that disruptions had a small negative impact on test scores on average. However, there was little difference in performance between the disrupted and non-disrupted examinees after removing records of the disrupted examinees who were unable to complete the test.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"76-82"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679922/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying Negative Binomial Distribution in Diagnostic Classification Models for Analyzing Count Data.","authors":"Ren Liu, Ihnwhi Heo, Haiyan Liu, Dexin Shi, Zhehan Jiang","doi":"10.1177/01466216221124604","DOIUrl":"10.1177/01466216221124604","url":null,"abstract":"<p><p>Diagnostic classification models (DCMs) have been used to classify examinees into groups based on their possession status of a set of latent traits. In addition to traditional item-based scoring approaches, examinees may be scored based on their completion of a series of small and similar tasks. Those scores are usually considered as count variables. To model count scores, this study proposes a new class of DCMs that uses the negative binomial distribution at its core. We explained the proposed model framework and demonstrated its use through an operational example. Simulation studies were conducted to evaluate the performance of the proposed model and compare it with the Poisson-based DCM.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"64-75"},"PeriodicalIF":1.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/07/94/10.1177_01466216221124604.PMC9679925.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feri Wijayanto, Ioan Gabriel Bucur, Perry Groot, Tom Heskes
{"title":"autoRasch: An R Package to Do Semi-Automated Rasch Analysis.","authors":"Feri Wijayanto, Ioan Gabriel Bucur, Perry Groot, Tom Heskes","doi":"10.1177/01466216221125178","DOIUrl":"https://doi.org/10.1177/01466216221125178","url":null,"abstract":"<p><p>The R package autoRasch has been developed to perform a Rasch analysis in a (semi-)automated way. The automated part of the analysis is achieved by optimizing the so-called <i>in-plus-out-of-questionnaire log-likelihood</i> (IPOQ-LL) or IPOQ-LL-DIF when differential item functioning (DIF) is included. These criteria measure the quality of fit on a pre-collected survey, depending on which items are included in the final instrument. To compute these criteria, autoRasch fits the generalized partial credit model (GPCM) or the generalized partial credit model with differential item functioning (GPCM-DIF) using penalized joint maximum likelihood estimation (PJMLE). The package further allows the user to reevaluate the output of the automated method and use it as a basis for performing a manual Rasch analysis and provides standard statistics of Rasch analyses (e.g., outfit, infit, person separation reliability, and residual correlation) to support the model reevaluation.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"83-85"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679921/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Outlier Detection Using t-test in Rasch IRT Equating under NEAT Design.","authors":"Chunyan Liu, Daniel Jurich","doi":"10.1177/01466216221124045","DOIUrl":"10.1177/01466216221124045","url":null,"abstract":"<p><p>In equating practice, the existence of outliers in the anchor items may deteriorate the equating accuracy and threaten the validity of test scores. Therefore, stability of the anchor item performance should be evaluated before conducting equating. This study used simulation to investigate the performance of the <i>t</i>-test method in detecting outliers and compared its performance with other outlier detection methods, including the logit difference method with 0.5 and 0.3 as the cutoff values and the robust <i>z</i> statistic with 2.7 as the cutoff value. The investigated factors included sample size, proportion of outliers, item difficulty drift direction, and group difference. Across all simulated conditions, the <i>t</i>-test method outperformed the other methods in terms of sensitivity of flagging true outliers, bias of the estimated translation constant, and the root mean square error of examinee ability estimates.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"34-47"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kuan-Yu Jin, Chia-Ling Hsu, Ming Ming Chiu, Po-Hsi Chen
{"title":"Modeling Rapid Guessing Behaviors in Computer-Based Testlet Items.","authors":"Kuan-Yu Jin, Chia-Ling Hsu, Ming Ming Chiu, Po-Hsi Chen","doi":"10.1177/01466216221125177","DOIUrl":"10.1177/01466216221125177","url":null,"abstract":"<p><p>In traditional test models, test items are independent, and test-takers slowly and thoughtfully respond to each test item. However, some test items have a common stimulus (dependent test items in a testlet), and sometimes test-takers lack motivation, knowledge, or time (speededness), so they perform rapid guessing (RG). Ignoring the dependence in responses to testlet items can negatively bias standard errors of measurement, and ignoring RG by fitting a simpler item response theory (IRT) model can bias the results. Because computer-based testing captures response times on testlet responses, we propose a mixture testlet IRT model with item responses and response time to model RG behaviors in computer-based testlet items. Two simulation studies with Markov chain Monte Carlo estimation using the JAGS program showed (a) good recovery of the item and person parameters in this new model and (b) the harmful consequences of ignoring RG (biased parameter estimates: overestimated item difficulties, underestimated time intensities, underestimated respondent latent speed parameters, and overestimated precision of respondent latent estimates). The application of IRT models with and without RG to data from a computer-based language test showed parameter differences resembling those in the simulations.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 1","pages":"19-33"},"PeriodicalIF":1.2,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9679923/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40494726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Metropolis-Hastings Robbins-Monro Algorithm for High-Dimensional Diagnostic Classification Models.","authors":"Chen-Wei Liu","doi":"10.1177/01466216221123981","DOIUrl":"10.1177/01466216221123981","url":null,"abstract":"<p><p>The expectation-maximization (EM) algorithm is a commonly used technique for the parameter estimation of the diagnostic classification models (DCMs) with a prespecified Q-matrix; however, it requires <i>O</i>(2 <sup><i>K</i></sup> ) calculations in its expectation-step, which significantly slows down the computation when the number of attributes, <i>K</i>, is large. This study proposes an efficient Metropolis-Hastings Robbins-Monro (eMHRM) algorithm, needing only <i>O</i>(<i>K</i> + 1) calculations in the Monte Carlo expectation step. Furthermore, the item parameters and structural parameters are approximated via the Robbins-Monro algorithm, which does not require time-consuming nonlinear optimization procedures. A series of simulation studies were conducted to compare the eMHRM with the EM and a Metropolis-Hastings (MH) algorithm regarding the parameter recovery and execution time. The outcomes presented in this article reveal that the eMHRM is much more computationally efficient than the EM and MH, and it tends to produce better estimates than the EM when <i>K</i> is large, suggesting that the eMHRM is a promising parameter estimation method for high-dimensional DCMs.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"662-674"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9574082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40656644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attenuation-Corrected Estimators of Reliability.","authors":"Jari Metsämuuronen","doi":"10.1177/01466216221108131","DOIUrl":"https://doi.org/10.1177/01466216221108131","url":null,"abstract":"<p><p>The estimates of reliability are usually attenuated and deflated because the item-score correlation ( <math> <mrow><msub><mi>ρ</mi> <mrow><mi>g</mi> <mi>X</mi></mrow> </msub> </mrow> </math> , <i>Rit</i>) embedded in the most widely used estimators is affected by several sources of mechanical error in the estimation. Empirical examples show that, in some types of datasets, the estimates by traditional alpha may be deflated by 0.40-0.60 units of reliability and those by maximal reliability by 0.40 units of reliability. This article proposes a new kind of estimator of correlation: attenuation-corrected correlation (<i>R</i> <sub><i>AC</i></sub> ): the proportion of observed correlation with the maximal possible correlation reachable by the given item and score. By replacing <math> <mrow><msub><mi>ρ</mi> <mrow><mi>g</mi> <mi>X</mi></mrow> </msub> </mrow> </math> with <i>R</i> <sub><i>AC</i></sub> in known formulas of estimators of reliability, we get attenuation-corrected alpha, theta, omega, and maximal reliability which all belong to a family of so-called deflation-corrected estimators of reliability.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"46 8","pages":"720-737"},"PeriodicalIF":1.2,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/66/7b/10.1177_01466216221108131.PMC9574086.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40573822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}