Journal of Educational Measurement最新文献_第2页

A Note on the Use of Categorical Subscores 关于使用分类分值的说明

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2025-01-07 DOI: 10.1111/jedm.12423

Kylie Gorney, Sandip Sinharay

引用次数: 0

An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving 利用创新的图形网络分析模拟空间推理问题的眼动的探索性研究

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-12-20 DOI: 10.1111/jedm.12421

Kaiwen Man, Joni M. Lakin

{"title":"An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving","authors":"Kaiwen Man, Joni M. Lakin","doi":"10.1111/jedm.12421","DOIUrl":"https://doi.org/10.1111/jedm.12421","url":null,"abstract":"Eye-tracking procedures generate copious process data that could be valuable in establishing the response processes component of modern validity theory. However, there is a lack of tools for assessing and visualizing response processes using process data such as eye-tracking fixation sequences, especially those suitable for young children. This study, which explored student responses to a spatial reasoning task, employed eye tracking and social network analysis to model, examine, and visualize students' visual transition patterns while solving spatial problems to begin to elucidate these processes. Fifty students in Grades 2–8 completed a spatial reasoning task as eye movements were recorded. Areas of interest (AoIs) were defined within the task for each spatial reasoning question. Transition networks between AoIs were constructed and analyzed using selected network measures. Results revealed shared transition sequences across students as well as strategic differences between high and low performers. High performers demonstrated more integrated transitions between AoIs, while low performers considered information more in isolation. Additionally, age and the interaction of age and performance did not significantly impact these measures. The study demonstrates a novel modeling approach for investigating visual processing and provides initial evidence that high-performing students more deeply engage with visual information in solving these types of questions.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"710-739"},"PeriodicalIF":1.4,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling Directional Testlet Effects on Multiple Open-Ended Questions 定向测试对多个开放式问题的影响建模

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-12-10 DOI: 10.1111/jedm.12422

Kuan-Yu Jin, Wai-Lok Siu

{"title":"Modeling Directional Testlet Effects on Multiple Open-Ended Questions","authors":"Kuan-Yu Jin, Wai-Lok Siu","doi":"10.1111/jedm.12422","DOIUrl":"https://doi.org/10.1111/jedm.12422","url":null,"abstract":"Educational tests often have a cluster of items linked by a common stimulus (testlet). In such a design, the dependencies caused between items are called testlet effects. In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect the scores on later items. This study aims to introduce an innovative measurement model to describe DTEs among multiple polytomouslyscored open-ended items. Through simulations, we found that (1) item and DTE parameters can be accurately recovered in Latent GOLD®, (2) ignoring positive (or negative) DTEs by fitting a standard item response theory model can result in the overestimation (or underestimation) of test reliability, (3) collapsing multiple items of a testlet into a super item is still effective in eliminating DTEs, (4) the popular multidimensional strategy of adding nuisance factors to describe item dependencies fails to account for DTE adequately, and (5) fitting the proposed model for DTE to testlet data involving nuisance factors will observe positive DTEs but will not have a better fit. Moreover, using the proposed model, we demonstrated the coexistence of positive and negative DTEs in a real history exam.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 1","pages":"81-100"},"PeriodicalIF":1.4,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143688659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Differences in Time Usage as a Competing Hypothesis for Observed Group Differences in Accuracy with an Application to Observed Gender Differences in PISA Data 时间使用差异作为观察到的群体准确性差异的竞争假设，并应用于观察到的PISA数据中的性别差异

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-11-01 DOI: 10.1111/jedm.12419

Radhika Kapoor, Erin Fahle, Klint Kanopka, David Klinowski, Ana Trindade Ribeiro, Benjamin W. Domingue

{"title":"Differences in Time Usage as a Competing Hypothesis for Observed Group Differences in Accuracy with an Application to Observed Gender Differences in PISA Data","authors":"Radhika Kapoor, Erin Fahle, Klint Kanopka, David Klinowski, Ana Trindade Ribeiro, Benjamin W. Domingue","doi":"10.1111/jedm.12419","DOIUrl":"https://doi.org/10.1111/jedm.12419","url":null,"abstract":"Group differences in test scores are a key metric in education policy. Response time offers novel opportunities for understanding these differences, especially in low-stakes settings. Here, we describe how observed group differences in test accuracy can be attributed to group differences in latent response speed or group differences in latent capacity, where capacity is defined as expected accuracy for a given response speed. This article introduces a method for decomposing observed group differences in accuracy into these differences in speed versus differences in capacity. We first illustrate in simulation studies that this approach can reliably distinguish between group speed and capacity differences. We then use this approach to probe gender differences in science and reading fluency in PISA 2018 for 71 countries. In science, score differentials largely increase when males, who respond more rapidly, are the higher performing group and decrease when females, who respond more slowly, are the higher performing group. In reading fluency, score differentials decrease where females, who respond more rapidly, are the higher performing group. This method can be used to analyze group differences especially in low-stakes assessments where there are potential group differences in speed.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"682-709"},"PeriodicalIF":1.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143247456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to “Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior” 修正“使用剖面相似度量扩展对数正态响应时间模型以改进异常测试行为的检测”

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-10-23 DOI: 10.1111/jedm.12418

引用次数: 0

Subscores: A Practical Guide to Their Production and Consumption. Shelby Haberman, Sandip Sinharay, Richard Feinberg, and Howard Wainer. Cambridge, Cambridge University Press 2024, 176 pp. (paperback) 分：他们的生产和消费的实用指南。Shelby Haberman, Sandip Sinharay， Richard Feinberg和Howard Wainer。剑桥，剑桥大学出版社2024年版，176页（平装本）

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-10-18 DOI: 10.1111/jedm.12417

Gautam Puhan

引用次数: 0

Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models 使用击键行为模式检测写作评估中的非真实文本：评估预测模型的公平性

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-10-18 DOI: 10.1111/jedm.12416

Yang Jiang, Mo Zhang, Jiangang Hao, Paul Deane, Chen Li

{"title":"Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models","authors":"Yang Jiang, Mo Zhang, Jiangang Hao, Paul Deane, Chen Li","doi":"10.1111/jedm.12416","DOIUrl":"https://doi.org/10.1111/jedm.12416","url":null,"abstract":"The emergence of sophisticated AI tools such as ChatGPT, coupled with the transition to remote delivery of educational assessments in the COVID-19 era, has led to increasing concerns about academic integrity and test security. Using AI tools, test takers can produce high-quality texts effortlessly and use them to game assessments. It is thus critical to detect these nonauthentic texts to ensure test integrity. In this study, we leveraged keystroke logs—recordings of every keypress—to build machine learning (ML) detectors of nonauthentic texts in a large-scale writing assessment. We focused on investigating the fairness of the detectors across demographic subgroups to ensure that nongenuine writing can be predicted equally well across subgroups. Results indicated that keystroke dynamics were effective in identifying nonauthentic texts. While the ML models were slightly more likely to misclassify the original responses submitted by male test takers as consisting of nonauthentic texts than those submitted by females, the effect sizes were negligible. Furthermore, balancing demographic distributions and class labels did not consistently mitigate detector bias across predictive models. Findings of this study not only provide implications for using behavioral data to address test security issues, but also highlight the importance of evaluating the fairness of predictive models in educational contexts.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"571-594"},"PeriodicalIF":1.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework 利用IRT残差DIF框架检测多组差异项目功能

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-10-17 DOI: 10.1111/jedm.12415

Hwanggyu Lim, Danqi Zhu, Edison M. Choe, KyungT. Han, Chris

{"title":"Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework","authors":"Hwanggyu Lim, Danqi Zhu, Edison M. Choe, KyungT. Han, Chris","doi":"10.1111/jedm.12415","DOIUrl":"https://doi.org/10.1111/jedm.12415","url":null,"abstract":"This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of implementation. The performance of GRDIF was assessed through a simulation study and compared with existing DIF detection methods, including the generalized Mantel-Haenszel, Lasso-DIF, and alignment methods. Results showed that the GRDIF framework demonstrated well-controlled Type I error rates close to the nominal level of .05 and satisfactory power in detecting uniform, nonuniform, and mixed DIF across different simulated conditions. Each of the three GRDIF statistics, <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>G</mi>\u0000 <mi>R</mi>\u0000 <mi>D</mi>\u0000 <mi>I</mi>\u0000 <msub>\u0000 <mi>F</mi>\u0000 <mi>R</mi>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$GRDI{{F}_R}$</annotation>\u0000 </semantics></math>, <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>G</mi>\u0000 <mi>R</mi>\u0000 <mi>D</mi>\u0000 <mi>I</mi>\u0000 <msub>\u0000 <mi>F</mi>\u0000 <mi>S</mi>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$GRDI{{F}_S}$</annotation>\u0000 </semantics></math>, and <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>G</mi>\u0000 <mi>R</mi>\u0000 <mi>D</mi>\u0000 <mi>I</mi>\u0000 <msub>\u0000 <mi>F</mi>\u0000 <mrow>\u0000 <mi>R</mi>\u0000 <mi>S</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$GRDI{{F}_{RS}}$</annotation>\u0000 </semantics></math>, effectively detected the specific type of DIF for which it was designed, with <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>G</mi>\u0000 <mi>R</mi>\u0000 <mi>D</mi>\u0000 <mi>I</mi>\u0000 <msub>\u0000 <mi>F</mi>\u0000 <mrow>\u0000 <mi>R</mi>\u0000 <mi>S</mi>\u0000 </mrow>\u0000 </msub>\u0000 </mrow>\u0000 <annotation>$GRDI{{F}_{RS}}$</annotation>\u0000 </semantics></math> exhibiting the most robust performance across all types of DIF. The GRDIF framework outperformed other","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"656-681"},"PeriodicalIF":1.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12415","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Item Response Tree Model for Items with Multiple-Choice and Constructed-Response Parts 具有多项选择和构造反应部分的项目反应树模型

IF 1.4 4区心理学

Journal of Educational Measurement Pub Date : 2024-10-07 DOI: 10.1111/jedm.12414

Junhuan Wei, Qin Wang, Buyun Dai, Yan Cai, Dongbo Tu

引用次数: 0

Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses 基于日志文件的行为过程数据分析的顺序储层计算

IF 1.3 4区心理学

Journal of Educational Measurement Pub Date : 2024-09-14 DOI: 10.1111/jedm.12413

Jiawei Xiong, Shiyu Wang, Cheng Tang, Qidi Liu, Rufei Sheng, Bowen Wang, Huan Kuang, Allan S. Cohen, Xinhui Xiong

{"title":"Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses","authors":"Jiawei Xiong, Shiyu Wang, Cheng Tang, Qidi Liu, Rufei Sheng, Bowen Wang, Huan Kuang, Allan S. Cohen, Xinhui Xiong","doi":"10.1111/jedm.12413","DOIUrl":"https://doi.org/10.1111/jedm.12413","url":null,"abstract":"The use of process data in assessment has gained attention in recent years as more assessments are administered by computers. Process data, recorded in computer log files, capture the sequence of examinees' response activities, for example, timestamped keystrokes, during the assessment. Traditional measurement methods are often inadequate for handling this type of data. In this paper, we proposed a sequential reservoir method (SRM) based on a reservoir computing model using the echo state network, with the particle swarm optimization and singular value decomposition as optimization. Designed to regularize features from process data through a computational self‐learning algorithm, this method has been evaluated using both simulated and empirical data. Simulation results suggested that, on one hand, the model effectively transforms action sequences into standardized and meaningful features, and on the other hand, these features are instrumental in categorizing latent behavioral groups and predicting latent information. Empirical results further indicate that SRM can predict assessment efficiency. The features extracted by SRM have been verified as related to action sequence lengths through the correlation analysis. This proposed method enhances the extraction and accessibility of meaningful information from process data, presenting an alternative to existing process data technologies.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0