{"title":"Using Item Scores and Distractors to Detect Test Speededness.","authors":"Kylie Gorney, James A Wollack, Daniel M Bolt","doi":"10.1177/01466216231182149","DOIUrl":"10.1177/01466216231182149","url":null,"abstract":"<p><p>Test speededness refers to a situation in which examinee performance is inadvertently affected by the time limit of the test. Because speededness has the potential to severely bias both person and item parameter estimates, it is crucial that speeded examinees are detected. In this article, we develop a change-point analysis (CPA) procedure for detecting test speededness. Our procedure distinguishes itself from existing CPA procedures by using information from both item scores and distractors. Using detailed simulations, we show that under most conditions, the new CPA procedure improves the detection of speeded examinees and produces more accurate change-point estimates. It therefore seems there is a considerable amount of information to be gained from the item distractors, which, quite notably are available in all multiple-choice data. A real data example is also provided.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41171818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.","authors":"Jiawei Xiong, Allan S Cohen, Xinhui Maggie Xiong","doi":"10.1177/01466216231201986","DOIUrl":"10.1177/01466216231201986","url":null,"abstract":"<p><p>Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats. Unlike the empirical Bayes method, the SB method estimates examinees' posterior ability parameters with individual-level sample-dependent prior distributions estimated from the MC items. Simulations were used to evaluate the accuracy of recovery of ability and item parameters over four factors: the type of the ability distribution, sample size, test length (number of items for each item type), and person/item parameter estimation method. The SB method was compared with a traditional concurrent Bayesian (CB) calibration method, EAPsum, that uses scaled scores for summed scores to estimate parameters from the MC and CR items simultaneously in one estimation step. From the simulation results, the SB method showed more accurate and reliable ability estimation than the CB method, especially when the sample size was small (150 and 500). Both methods presented similar recovery results for MC item parameters, but the CB method yielded a bit better recovery of the CR item parameters. The empirical example suggested that posterior ability estimated by the proposed SB method had higher reliability than the CB method.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41180283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments.","authors":"Hung-Yu Huang","doi":"10.1177/01466216231174566","DOIUrl":"https://doi.org/10.1177/01466216231174566","url":null,"abstract":"<p><p>Rater effects are commonly observed in rater-mediated assessments. By using item response theory (IRT) modeling, raters can be treated as independent factors that function as instruments for measuring ratees. Most rater effects are static and can be addressed appropriately within an IRT framework, and a few models have been developed for dynamic rater effects. Operational rating projects often require human raters to continuously and repeatedly score ratees over a certain period, imposing a burden on the cognitive processing abilities and attention spans of raters that stems from judgment fatigue and thus affects the rating quality observed during the rating period. As a result, ratees' scores may be influenced by the order in which they are graded by raters in a rating sequence, and the rating order effect should be considered in new IRT models. In this study, two types of many-faceted (MF)-IRT models are developed to account for such dynamic rater effects, which assume that rater severity can drift systematically or stochastically. The results obtained from two simulation studies indicate that the parameters of the newly developed models can be estimated satisfactorily using Bayesian estimation and that disregarding the rating order effect produces biased model structure and ratee proficiency parameter estimations. A creativity assessment is outlined to demonstrate the application of the new models and to investigate the consequences of failing to detect the possible rating order effect in a real rater-mediated evaluation.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7c/68/10.1177_01466216231174566.PMC10240569.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mixed Sequential IRT Model for Mixed-Format Items.","authors":"Junhuan Wei, Yan Cai, Dongbo Tu","doi":"10.1177/01466216231165302","DOIUrl":"10.1177/01466216231165302","url":null,"abstract":"<p><p>To provide more insight into an individual's response process and cognitive process, this study proposed three mixed sequential item response models (MS-IRMs) for mixed-format items consisting of a mixture of a multiple-choice item and an open-ended item that emphasize a sequential response process and are scored sequentially. Relative to existing polytomous models such as the graded response model (GRM), generalized partial credit model (GPCM), or traditional sequential Rasch model (SRM), the proposed models employ an appropriate processing function for each task to improve conventional polytomous models. Simulation studies were carried out to investigate the performance of the proposed models, and the results indicated that all proposed models outperformed the SRM, GRM, and GPCM in terms of parameter recovery and model fit. An application illustration of the MS-IRMs in comparison with traditional models was demonstrated by using real data from TIMSS 2007.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10297969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Parameter Estimation for Student Evaluation of Teaching.","authors":"Chia-Wen Chen, Chen-Wei Liu","doi":"10.1177/01466216231165314","DOIUrl":"10.1177/01466216231165314","url":null,"abstract":"<p><p>Student evaluation of teaching (SET) assesses students' experiences in a class to evaluate teachers' performance in class. SET essentially comprises three facets: teaching proficiency, student rating harshness, and item properties. The computerized adaptive testing form of SET with an established item pool has been used in educational environments. However, conventional scoring methods ignore the harshness of students toward teachers and, therefore, are unable to provide a valid assessment. In addition, simultaneously estimating teachers' teaching proficiency and students' harshness remains an unaddressed issue in the context of online SET. In the current study, we develop and compare three novel methods-marginal, iterative once, and hybrid approaches-to improve the precision of parameter estimations. A simulation study is conducted to demonstrate that the hybrid method is a promising technique that can substantially outperform traditional methods.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240567/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaojian Sun, Shimeng Wang, Lei Guo, Tao Xin, Naiqing Song
{"title":"Using a Generalized Logistic Regression Method to Detect Differential Item Functioning With Multiple Groups in Cognitive Diagnostic Tests.","authors":"Xiaojian Sun, Shimeng Wang, Lei Guo, Tao Xin, Naiqing Song","doi":"10.1177/01466216231174559","DOIUrl":"10.1177/01466216231174559","url":null,"abstract":"<p><p>Items with the presence of differential item functioning (DIF) will compromise the validity and fairness of a test. Studies have investigated the DIF effect in the context of cognitive diagnostic assessment (CDA), and some DIF detection methods have been proposed. Most of these methods are mainly designed to perform the presence of DIF between two groups; however, empirical situations may contain more than two groups. To date, only a handful of studies have detected the DIF effect with multiple groups in the CDA context. This study uses the generalized logistic regression (GLR) method to detect DIF items by using the estimated attribute profile as matching criteria. A simulation study is conducted to examine the performance of the two GLR methods, GLR-based Wald test (GLR-Wald) and GLR-based likelihood ratio test (GLR-LRT), in detecting the DIF items, the results based on the ordinary Wald test are also reported. Results show that (1) both GLR-Wald and GLR-LRT have more reasonable performance in controlling Type I error rates than the ordinary Wald test in most conditions; (2) the GLR method also produces higher empirical rejection rates than the ordinary Wald test in most conditions; and (3) using the estimated attribute profile as the matching criteria can produce similar Type I error rates and empirical rejection rates for GLR-Wald and GLR-LRT. A real data example is also analyzed to illustrate the application of these DIF detection methods in multiple groups.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Impact of Item Model Parameter Variations on Person Parameter Estimation in Computerized Adaptive Testing With Automatically Generated Items.","authors":"Chen Tian, Jaehwa Choi","doi":"10.1177/01466216231165313","DOIUrl":"10.1177/01466216231165313","url":null,"abstract":"<p><p>Sibling items developed through automatic item generation share similar but not identical psychometric properties. However, considering sibling item variations may bring huge computation difficulties and little improvement on scoring. Assuming identical characteristics among siblings, this study explores the impact of item model parameter variations (i.e., within-family variation between siblings) on person parameter estimation in linear tests and Computerized Adaptive Testing (CAT). Specifically, we explore (1) what if small/medium/large within-family variance is ignored, (2) if the effect of larger within-model variance can be compensated by greater test length, (3) if the item model pool properties affect the impact of within-family variance on scoring, and (4) if the issues in (1) and (2) are different in linear vs. adaptive testing. Related sibling model is used for data generation and identical sibling model is assumed for scoring. Manipulated factors include test length, the size of within-model variation, and item model pool characteristics. Results show that as within-family variance increases, the standard error of scores remains at similar levels. For correlations between true and estimated score and RMSE, the effect of the larger within-model variance was compensated by test length. For bias, scores are biased towards the center, and bias was not compensated by test length. Despite the within-family variation is random in current simulations, to yield less biased ability estimates, the item model pool should provide balanced opportunities such that \"fake-easy\" and \"fake-difficult\" item instances cancel their effects. The results of CAT are similar to that of linear tests, except for higher efficiency.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240571/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Approach to Desirable Responding: Multidimensional Item Response Model of Overclaiming Data.","authors":"Kuan-Yu Jin, Delroy L Paulhus, Ching-Lin Shih","doi":"10.1177/01466216231151704","DOIUrl":"10.1177/01466216231151704","url":null,"abstract":"<p><p>A variety of approaches have been presented for assessing desirable responding in self-report measures. Among them, the overclaiming technique asks respondents to rate their familiarity with a large set of real and nonexistent items (foils). The application of signal detection formulas to the endorsement rates of real items and foils yields indices of (a) <i>knowledge accuracy</i> and (b) <i>knowledge bias</i>. This overclaiming technique reflects both cognitive ability and personality. Here, we develop an alternative measurement model based on multidimensional item response theory (MIRT). We report three studies demonstrating this new model's capacity to analyze overclaiming data. First, a simulation study illustrates that MIRT and signal detection theory yield comparable indices of accuracy and bias-although MIRT provides important additional information. Two empirical examples-one based on mathematical terms and one based on Chinese idioms-are then elaborated. Together, they demonstrate the utility of this new approach for group comparisons and item selection. The implications of this research are illustrated and discussed.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9363746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Testlet Diagnostic Classification Model with Attribute Hierarchies.","authors":"Wenchao Ma, Chun Wang, Jiaying Xiao","doi":"10.1177/01466216231165315","DOIUrl":"10.1177/01466216231165315","url":null,"abstract":"<p><p>In this article, a testlet hierarchical diagnostic classification model (TH-DCM) was introduced to take both attribute hierarchies and item bundles into account. The expectation-maximization algorithm with an analytic dimension reduction technique was used for parameter estimation. A simulation study was conducted to assess the parameter recovery of the proposed model under varied conditions, and to compare TH-DCM with testlet higher-order CDM (THO-DCM; Hansen, M. (2013). Hierarchical item response models for cognitive diagnosis (Unpublished doctoral dissertation). UCLA; Zhan, P., Li, X., Wang, W.-C., Bian, Y., & Wang, L. (2015). The multidimensional testlet-effect cognitive diagnostic models. Acta Psychologica Sinica, 47(5), 689. https://doi.org/10.3724/SP.J.1041.2015.00689). Results showed that (1) ignoring large testlet effects worsened parameter recovery, (2) DCMs assuming equal testlet effects within each testlet performed as well as the testlet model assuming unequal testlet effects under most conditions, (3) misspecifications in joint attribute distribution had an differential impact on parameter recovery, and (4) THO-DCM seems to be a robust alternative to TH-DCM under some hierarchical structures. A set of real data was also analyzed for illustration.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126385/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9357116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Folly of Introducing A (Time-Based UMV), While Designing for B (Time-Based CMV).","authors":"Alice Brawley Newlin","doi":"10.1177/01466216231165304","DOIUrl":"10.1177/01466216231165304","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10126389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9363899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}