Educational and Psychological Measurement最新文献_第8页

The NEAT Equating Via Chaining Random Forests in the Context of Small Sample Sizes: A Machine-Learning Method. 在小样本量的背景下，通过链接随机森林的NEAT等式：一种机器学习方法。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-10-01 Epub Date: 2022-09-04 DOI: 10.1177/00131644221120899

Zhehan Jiang, Yuting Han, Lingling Xu, Dexin Shi, Ren Liu, Jinying Ouyang, Fen Cai

{"title":"The NEAT Equating Via Chaining Random Forests in the Context of Small Sample Sizes: A Machine-Learning Method.","authors":"Zhehan Jiang, Yuting Han, Lingling Xu, Dexin Shi, Ren Liu, Jinying Ouyang, Fen Cai","doi":"10.1177/00131644221120899","DOIUrl":"10.1177/00131644221120899","url":null,"abstract":"The part of responses that is absent in the nonequivalent groups with anchor test (NEAT) design can be managed to a planned missing scenario. In the context of small sample sizes, we present a machine learning (ML)-based imputation technique called chaining random forests (CRF) to perform equating tasks within the NEAT design. Specifically, seven CRF-based imputation equating methods are proposed based on different data augmentation methods. The equating performance of the proposed methods is examined through a simulation study. Five factors are considered: (a) test length (20, 30, 40, 50), (b) sample size per test form (50 versus 100), (c) ratio of common/anchor items (0.2 versus 0.3), and (d) equivalent versus nonequivalent groups taking the two forms (no mean difference versus a mean difference of 0.5), and (e) three different types of anchors (random, easy, and hard), resulting in 96 conditions. In addition, five traditional equating methods, (1) Tucker method; (2) Levine observed score method; (3) equipercentile equating method; (4) circle-arc method; and (5) concurrent calibration based on Rasch model, were also considered, plus seven CRF-based imputation equating methods for a total of 12 methods in this study. The findings suggest that benefiting from the advantages of ML techniques, CRF-based methods that incorporate the equating result of the Tucker method, such as IMP_total_Tucker, IMP_pair_Tucker, and IMP_Tucker_cirlce methods, can yield more robust and trustable estimates for the \"missingness\" in an equating task and therefore result in more accurate equated scores than other counterparts in short-length tests with small samples.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 5","pages":"984-1006"},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470159/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10357823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalized Mantel-Haenszel Estimators for Simultaneous Differential Item Functioning Tests. 同时微分项函数检验的广义Mantel-Haenszel估计。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-10-01 Epub Date: 2022-10-15 DOI: 10.1177/00131644221128341

Ivy Liu, Thomas Suesse, Samuel Harvey, Peter Yongqi Gu, Daniel Fernández, John Randal

引用次数: 0

Detecting Cheating in Large-Scale Assessment: The Transfer of Detectors to New Tests. 大规模评估中的作弊检测：检测器向新测试的转移。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-10-01 Epub Date: 2022-11-04 DOI: 10.1177/00131644221132723

Jochen Ranger, Nico Schmidt, Anett Wolgast

{"title":"Detecting Cheating in Large-Scale Assessment: The Transfer of Detectors to New Tests.","authors":"Jochen Ranger, Nico Schmidt, Anett Wolgast","doi":"10.1177/00131644221132723","DOIUrl":"10.1177/00131644221132723","url":null,"abstract":"Recent approaches to the detection of cheaters in tests employ detectors from the field of machine learning. Detectors based on supervised learning algorithms achieve high accuracy but require labeled data sets with identified cheaters for training. Labeled data sets are usually not available at an early stage of the assessment period. In this article, we discuss the approach of adapting a detector that was trained previously with a labeled training data set to a new unlabeled data set. The training and the new data set may contain data from different tests. The adaptation of detectors to new data or tasks is denominated as transfer learning in the field of machine learning. We first discuss the conditions under which a detector of cheating can be transferred. We then investigate whether the conditions are met in a real data set. We finally evaluate the benefits of transferring a detector of cheating. We find that a transferred detector has higher accuracy than an unsupervised detector of cheating. A naive transfer that consists of a simple reuse of the detector increases the accuracy considerably. A transfer via a self-labeling (SETRED) algorithm increases the accuracy slightly more than the naive transfer. The findings suggest that the detection of cheating might be improved by using existing detectors of cheating at an early stage of an assessment period.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 5","pages":"1033-1058"},"PeriodicalIF":2.1,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10470164/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10525104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal Data Fusion to Detect Preknowledge Test-Taking Behavior Using Machine Learning 利用机器学习检测预见性应试行为的多模态数据融合

3区心理学

Educational and Psychological Measurement Pub Date : 2023-09-19 DOI: 10.1177/00131644231193625

Kaiwen Man

引用次数: 0

Fixed Effects or Mixed Effects Classifiers? Evidence From Simulated and Archival Data. 固定效应还是混合效应分类器？来自模拟数据和档案数据的证据

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-08-01 Epub Date: 2022-06-30 DOI: 10.1177/00131644221108180

Anthony A Mangino, Jocelyn H Bolin, W Holmes Finch

引用次数: 0

Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment. 探索用于大规模评估作弊检测的堆叠集合机器学习算法。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-08-01 Epub Date: 2022-08-13 DOI: 10.1177/00131644221117193

Todd Zhou, Hong Jiao

{"title":"Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment.","authors":"Todd Zhou, Hong Jiao","doi":"10.1177/00131644221117193","DOIUrl":"10.1177/00131644221117193","url":null,"abstract":"Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 4","pages":"831-854"},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing the Psychometric Properties of a Scale Across Three Likert and Three Alternative Formats: An Application to the Rosenberg Self-Esteem Scale. 比较三种李克特和三种替代格式量表的心理测量特性:在罗森博格自尊量表中的应用。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-08-01 DOI: 10.1177/00131644221111402

Xijuan Zhang, Linnan Zhou, Victoria Savalei

{"title":"Comparing the Psychometric Properties of a Scale Across Three Likert and Three Alternative Formats: An Application to the Rosenberg Self-Esteem Scale.","authors":"Xijuan Zhang, Linnan Zhou, Victoria Savalei","doi":"10.1177/00131644221111402","DOIUrl":"https://doi.org/10.1177/00131644221111402","url":null,"abstract":"Zhang and Savalei proposed an alternative scale format to the Likert format, called the Expanded format. In this format, response options are presented in complete sentences, which can reduce acquiescence bias and method effects. The goal of the current study was to compare the psychometric properties of the Rosenberg Self-Esteem Scale (RSES) in the Expanded format and in two other alternative formats, relative to several versions of the traditional Likert format. We conducted two studies to compare the psychometric properties of the RSES across the different formats. We found that compared with the Likert format, the alternative formats tend to have a unidimensional factor structure, less response inconsistency, and comparable validity. In addition, we found that the Expanded format resulted in the best factor structure among the three alternative formats. Researchers should consider the Expanded format, especially when creating short psychological scales such as the RSES.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 4","pages":"649-683"},"PeriodicalIF":2.7,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/0c/99/10.1177_00131644221111402.PMC10311935.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9802113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Relative Robustness of CDMs and (M)IRT in Measuring Growth in Latent Skills. CDMs 和 (M)IRT 在衡量潜在技能增长方面的相对稳健性。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-08-01 Epub Date: 2022-08-18 DOI: 10.1177/00131644221117194

Qi Helen Huang, Daniel M Bolt

{"title":"Relative Robustness of CDMs and (M)IRT in Measuring Growth in Latent Skills.","authors":"Qi Helen Huang, Daniel M Bolt","doi":"10.1177/00131644221117194","DOIUrl":"10.1177/00131644221117194","url":null,"abstract":"Previous studies have demonstrated evidence of latent skill continuity even in tests intentionally designed for measurement of binary skills. In addition, the assumption of binary skills when continuity is present has been shown to potentially create a lack of invariance in item and latent ability parameters that may undermine applications. In this article, we examine measurement of growth as one such application, and consider multidimensional item response theory (MIRT) as a competing alternative. Motivated by prior findings concerning the effects of skill continuity, we study the relative robustness of cognitive diagnostic models (CDMs) and (M)IRT models in the measurement of growth under both binary and continuous latent skill distributions. We find CDMs to be a less robust way of quantifying growth under misspecification, and subsequently provide a real-data example suggesting underestimation of growth as a likely consequence. It is suggested that researchers should regularly attend to the assumptions associated with the use of latent binary skills and consider (M)IRT as a potentially more robust alternative if unsure of their discrete nature.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 4","pages":"808-830"},"PeriodicalIF":2.1,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311955/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9747520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Are Speeded Tests Unfair? Modeling the Impact of Time Limits on the Gender Gap in Mathematics. 快速测试不公平吗?时间限制对数学性别差距的影响建模。

IF 2.7 3区心理学

Educational and Psychological Measurement Pub Date : 2023-08-01 DOI: 10.1177/00131644221111076

Andrea H Stoevenbelt, Jelte M Wicherts, Paulette C Flore, Lorraine A T Phillips, Jakob Pietschnig, Bruno Verschuere, Martin Voracek, Inga Schwabe

{"title":"Are Speeded Tests Unfair? Modeling the Impact of Time Limits on the Gender Gap in Mathematics.","authors":"Andrea H Stoevenbelt, Jelte M Wicherts, Paulette C Flore, Lorraine A T Phillips, Jakob Pietschnig, Bruno Verschuere, Martin Voracek, Inga Schwabe","doi":"10.1177/00131644221111076","DOIUrl":"https://doi.org/10.1177/00131644221111076","url":null,"abstract":"When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test is administered with a strict time limit, whereas gender gaps are frequently reduced when time limits are relaxed. In this study, we propose that gender differences in test strategy might inflate gender gaps favoring men, and relate test strategy to stereotype threat effects under which women underperform due to the pressure of negative stereotypes about their performance. First, we applied a Bayesian two-dimensional item response theory (IRT) model to data obtained from two registered reports that investigated stereotype threat in mathematics, and estimated the latent correlation between underlying test strategy (here, completion factor, a proxy for working speed) and mathematics ability. Second, we tested the gender gap and assessed potential effects of stereotype threat on female test performance. We found a positive correlation between the completion factor and mathematics ability, such that more able participants dropped out later in the test. We did not observe a stereotype threat effect but found larger gender differences on the latent completion factor than on latent mathematical ability, suggesting that test strategies affect the gender gap in timed mathematics performance. We argue that if the effect of time limits on tests is not taken into account, this may lead to test unfairness and biased group comparisons, and urge researchers to consider these effects in either their analyses or study planning.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 4","pages":"684-709"},"PeriodicalIF":2.7,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311959/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10299044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Robust Method for Detecting Item Misfit in Large-Scale Assessments. 在大规模评估中检测项目错位的稳健方法。

IF 2.1 3区心理学

Educational and Psychological Measurement Pub Date : 2023-08-01 Epub Date: 2022-07-02 DOI: 10.1177/00131644221105819

Matthias von Davier, Ummugul Bezirhan

引用次数: 0