Applied Psychological Measurement最新文献_第6页

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests 项目反应理论核等价在混合格式测试中的有效性分析

4区心理学

Applied Psychological Measurement Pub Date : 2023-10-19 DOI: 10.1177/01466216231209757

Joakim Wallmark, Maria Josefsson, Marie Wiberg

引用次数: 0

Comparing Person-Fit and Traditional Indices Across Careless Response Patterns in Surveys. 比较调查中随意反应模式下的个人适合度和传统指数。

IF 1 4区心理学

Applied Psychological Measurement Pub Date : 2023-09-01 Epub Date: 2023-08-03 DOI: 10.1177/01466216231194358

Eli A Jones, Stefanie A Wind, Chia-Lin Tsai, Yuan Ge

{"title":"Comparing Person-Fit and Traditional Indices Across Careless Response Patterns in Surveys.","authors":"Eli A Jones, Stefanie A Wind, Chia-Lin Tsai, Yuan Ge","doi":"10.1177/01466216231194358","DOIUrl":"10.1177/01466216231194358","url":null,"abstract":"Methods to identify carelessness in survey research can be valuable tools in reducing bias during survey development, validation, and use. Because carelessness may take multiple forms, researchers typically use multiple indices when identifying carelessness. In the current study, we extend the literature on careless response identification by examining the usefulness of three item-response theory-based person-fit indices for both random and overconsistent careless response identification: infit MSE outfit MSE, and the polytomous lz statistic. We compared these statistics with traditional careless response indices using both empirical data and simulated data. The empirical data included 2,049 high school student surveys of teaching effectiveness from the Network for Educator Effectiveness. In the simulated data, we manipulated type of carelessness (random response or overconsistency) and percent of carelessness present (0%, 5%, 10%, 20%). Results suggest that infit and outfit MSE and the lz statistic may provide complementary information to traditional indices such as LongString, Mahalanobis Distance, Validity Items, and Completion Time. Receiver operating characteristic curves suggested that the person-fit indices showed good sensitivity and specificity for classifying both over-consistent and under-consistent careless patterns, thus functioning in a bidirectional manner. Carelessness classifications based on low fit values correlated with carelessness classifications from LongString and completion time, and classifications based on high fit values correlated with classifications from Mahalanobis Distance. We consider implications for research and practice.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 5-6","pages":"365-385"},"PeriodicalIF":1.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41155112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Does Sparseness Matter? Examining the Use of Generalizability Theory and Many-Facet Rasch Measurement in Sparse Rating Designs. 稀疏很重要吗？考察概化理论和多面粗糙度测量在稀疏评级设计中的应用。

IF 1 4区心理学

Applied Psychological Measurement Pub Date : 2023-09-01 Epub Date: 2023-06-07 DOI: 10.1177/01466216231182148

Stefanie A Wind, Eli Jones, Sara Grajeda

引用次数: 0

The Effects of Aberrant Responding on Model-Fit Assuming Different Underlying Response Processes. 假设不同的基本响应过程，偏离响应对模型拟合的影响。

IF 1 4区心理学

Applied Psychological Measurement Pub Date : 2023-09-01 Epub Date: 2023-09-19 DOI: 10.1177/01466216231201987

Jennifer Reimers, Ronna C Turner, Jorge N Tendeiro, Wen-Juo Lo, Elizabeth Keiffer

{"title":"The Effects of Aberrant Responding on Model-Fit Assuming Different Underlying Response Processes.","authors":"Jennifer Reimers, Ronna C Turner, Jorge N Tendeiro, Wen-Juo Lo, Elizabeth Keiffer","doi":"10.1177/01466216231201987","DOIUrl":"10.1177/01466216231201987","url":null,"abstract":"Aberrant responding on tests and surveys has been shown to affect the psychometric properties of scales and the statistical analyses from the use of those scales in cumulative model contexts. This study extends prior research by comparing the effects of four types of aberrant responding on model fit in both cumulative and ideal point model contexts using graded partial credit (GPCM) and generalized graded unfolding (GGUM) models. When fitting models to data, model misfit can be both a function of misspecification and aberrant responding. Results demonstrate how varying levels of aberrant data can severely impact model fit for both cumulative and ideal point data. Specifically, longstring responses have a stronger impact on dimensionality for both ideal point and cumulative data, while random responding tends to have the most negative impact on data model fit according to information criteria (AIC, BIC). The results also indicate that ideal point data models such as GGUM may be able to fit cumulative data as well as the cumulative model itself (GPCM), whereas cumulative data models may not provide sufficient model fit for data simulated using an ideal point model.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 5-6","pages":"420-437"},"PeriodicalIF":1.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41171817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Item Scores and Distractors to Detect Test Speededness. 使用项目得分和分心因素来检测测试速度。

IF 1 4区心理学

Applied Psychological Measurement Pub Date : 2023-09-01 Epub Date: 2023-06-15 DOI: 10.1177/01466216231182149

Kylie Gorney, James A Wollack, Daniel M Bolt

引用次数: 0

Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests. 序列贝叶斯能力估计在混合格式项目测试中的应用。

IF 1 4区心理学

Applied Psychological Measurement Pub Date : 2023-09-01 Epub Date: 2023-09-08 DOI: 10.1177/01466216231201986

Jiawei Xiong, Allan S Cohen, Xinhui Maggie Xiong

{"title":"Sequential Bayesian Ability Estimation Applied to Mixed-Format Item Tests.","authors":"Jiawei Xiong, Allan S Cohen, Xinhui Maggie Xiong","doi":"10.1177/01466216231201986","DOIUrl":"10.1177/01466216231201986","url":null,"abstract":"Large-scale tests often contain mixed-format items, such as when multiple-choice (MC) items and constructed-response (CR) items are both contained in the same test. Although previous research has analyzed both types of items simultaneously, this may not always provide the best estimate of ability. In this paper, a two-step sequential Bayesian (SB) analytic method under the concept of empirical Bayes is explored for mixed item response models. This method integrates ability estimates from different item formats. Unlike the empirical Bayes method, the SB method estimates examinees' posterior ability parameters with individual-level sample-dependent prior distributions estimated from the MC items. Simulations were used to evaluate the accuracy of recovery of ability and item parameters over four factors: the type of the ability distribution, sample size, test length (number of items for each item type), and person/item parameter estimation method. The SB method was compared with a traditional concurrent Bayesian (CB) calibration method, EAPsum, that uses scaled scores for summed scores to estimate parameters from the MC and CR items simultaneously in one estimation step. From the simulation results, the SB method showed more accurate and reliable ability estimation than the CB method, especially when the sample size was small (150 and 500). Both methods presented similar recovery results for MC item parameters, but the CB method yielded a bit better recovery of the CR item parameters. The empirical example suggested that posterior ability estimated by the proposed SB method had higher reliability than the CB method.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 5-6","pages":"402-419"},"PeriodicalIF":1.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10552734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41180283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments. 评价中介评价项目反应理论模型下评价顺序效应的建模。

IF 1.2 4区心理学

Applied Psychological Measurement Pub Date : 2023-06-01 DOI: 10.1177/01466216231174566

Hung-Yu Huang

{"title":"Modeling Rating Order Effects Under Item Response Theory Models for Rater-Mediated Assessments.","authors":"Hung-Yu Huang","doi":"10.1177/01466216231174566","DOIUrl":"https://doi.org/10.1177/01466216231174566","url":null,"abstract":"Rater effects are commonly observed in rater-mediated assessments. By using item response theory (IRT) modeling, raters can be treated as independent factors that function as instruments for measuring ratees. Most rater effects are static and can be addressed appropriately within an IRT framework, and a few models have been developed for dynamic rater effects. Operational rating projects often require human raters to continuously and repeatedly score ratees over a certain period, imposing a burden on the cognitive processing abilities and attention spans of raters that stems from judgment fatigue and thus affects the rating quality observed during the rating period. As a result, ratees' scores may be influenced by the order in which they are graded by raters in a rating sequence, and the rating order effect should be considered in new IRT models. In this study, two types of many-faceted (MF)-IRT models are developed to account for such dynamic rater effects, which assume that rater severity can drift systematically or stochastically. The results obtained from two simulation studies indicate that the parameters of the newly developed models can be estimated satisfactorily using Bayesian estimation and that disregarding the rating order effect produces biased model structure and ratee proficiency parameter estimations. A creativity assessment is outlined to demonstrate the application of the new models and to investigate the consequences of failing to detect the possible rating order effect in a real rater-mediated evaluation.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 4","pages":"312-327"},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7c/68/10.1177_01466216231174566.PMC10240569.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Mixed Sequential IRT Model for Mixed-Format Items. 混合格式项目的混合序列 IRT 模型。

IF 1.2 4区心理学

Applied Psychological Measurement Pub Date : 2023-06-01 Epub Date: 2023-03-17 DOI: 10.1177/01466216231165302

Junhuan Wei, Yan Cai, Dongbo Tu

引用次数: 0

Online Parameter Estimation for Student Evaluation of Teaching. 学生教学评价的在线参数估计。

IF 1.2 4区心理学

Applied Psychological Measurement Pub Date : 2023-06-01 Epub Date: 2023-03-19 DOI: 10.1177/01466216231165314

Chia-Wen Chen, Chen-Wei Liu

引用次数: 0

Using a Generalized Logistic Regression Method to Detect Differential Item Functioning With Multiple Groups in Cognitive Diagnostic Tests. 使用广义逻辑回归法检测认知诊断测试中多个组别的差异项目功能。

IF 1.2 4区心理学

Applied Psychological Measurement Pub Date : 2023-06-01 Epub Date: 2023-05-13 DOI: 10.1177/01466216231174559

Xiaojian Sun, Shimeng Wang, Lei Guo, Tao Xin, Naiqing Song

{"title":"Using a Generalized Logistic Regression Method to Detect Differential Item Functioning With Multiple Groups in Cognitive Diagnostic Tests.","authors":"Xiaojian Sun, Shimeng Wang, Lei Guo, Tao Xin, Naiqing Song","doi":"10.1177/01466216231174559","DOIUrl":"10.1177/01466216231174559","url":null,"abstract":"Items with the presence of differential item functioning (DIF) will compromise the validity and fairness of a test. Studies have investigated the DIF effect in the context of cognitive diagnostic assessment (CDA), and some DIF detection methods have been proposed. Most of these methods are mainly designed to perform the presence of DIF between two groups; however, empirical situations may contain more than two groups. To date, only a handful of studies have detected the DIF effect with multiple groups in the CDA context. This study uses the generalized logistic regression (GLR) method to detect DIF items by using the estimated attribute profile as matching criteria. A simulation study is conducted to examine the performance of the two GLR methods, GLR-based Wald test (GLR-Wald) and GLR-based likelihood ratio test (GLR-LRT), in detecting the DIF items, the results based on the ordinary Wald test are also reported. Results show that (1) both GLR-Wald and GLR-LRT have more reasonable performance in controlling Type I error rates than the ordinary Wald test in most conditions; (2) the GLR method also produces higher empirical rejection rates than the ordinary Wald test in most conditions; and (3) using the estimated attribute profile as the matching criteria can produce similar Type I error rates and empirical rejection rates for GLR-Wald and GLR-LRT. A real data example is also analyzed to illustrate the application of these DIF detection methods in multiple groups.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 4","pages":"328-346"},"PeriodicalIF":1.2,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10240570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10300639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0