Journal of Educational and Behavioral Statistics最新文献

The Rank-2PL IRT Models for Forced-Choice Questionnaires: Maximum Marginal Likelihood Estimation with an EM Algorithm. 强迫选择问卷的Rank-2PL IRT模型：基于EM算法的最大边际似然估计。

IF 1.7 3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2025-06-01 Epub Date: 2024-06-18 DOI: 10.3102/10769986241256030

Jianbin Fu, Xuan Tan, Patrick C Kyllonen

引用次数: 0

Mixed-Effects Location Scale Models for Joint Modeling School Value-Added Effects on the Mean and Variance of Student Achievement. 联合建模学校增值对学生成绩均值和方差影响的混合效应区位尺度模型。

IF 1.9 3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2024-12-01 Epub Date: 2023-11-27 DOI: 10.3102/10769986231210808

George Leckie, Richard Parker, Harvey Goldstein, Kate Tilling

{"title":"Mixed-Effects Location Scale Models for Joint Modeling School Value-Added Effects on the Mean and Variance of Student Achievement.","authors":"George Leckie, Richard Parker, Harvey Goldstein, Kate Tilling","doi":"10.3102/10769986231210808","DOIUrl":"10.3102/10769986231210808","url":null,"abstract":"<p><p>School value-added models are widely applied to study, monitor, and hold schools to account for school differences in student learning. The traditional model is a mixed-effects linear regression of student current achievement on student prior achievement, background characteristics, and a school random intercept effect. The latter is referred to as the school value-added score and measures the mean student covariate-adjusted achievement in each school. In this article, we argue that further insights may be gained by additionally studying the variance in this quantity in each school. These include the ability to identify both individual schools and school types that exhibit unusually high or low variability in student achievement, even after accounting for differences in student intakes. We explore and illustrate how this can be done via fitting mixed-effects location scale versions of the traditional school value-added model. We discuss the implications of our work for research and school accountability systems.</p>","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"49 6","pages":"879-911"},"PeriodicalIF":1.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7617570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143811973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Balance in Educational Measurement: A Legacy of E. F. Lindquist 改善教育测量的平衡性：林奎斯特的遗产

IF 2.4 3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2024-01-07 DOI: 10.3102/10769986231218306

Daniel Koretz

引用次数: 0

A Simple Technique Assessing Ordinal and Disordinal Interaction Effects 评估顺序和非顺序交互效应的简单技术

IF 2.4 3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2023-12-21 DOI: 10.3102/10769986231217472

Sang-June Park, Youjae Yi

引用次数: 0

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement 潜在语义分析与潜在德里希勒分配在教育测量中的比较

IF 2.4 3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2023-11-27 DOI: 10.3102/10769986231209446

Jordan M. Wheeler, Allan S. Cohen, Shiyu Wang

{"title":"A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement","authors":"Jordan M. Wheeler, Allan S. Cohen, Shiyu Wang","doi":"10.3102/10769986231209446","DOIUrl":"https://doi.org/10.3102/10769986231209446","url":null,"abstract":"Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming more common in educational measurement research as a method for analyzing students’ responses to constructed-response items. Two popular topic models are latent semantic analysis (LSA) and latent Dirichlet allocation (LDA). LSA uses linear algebra techniques, whereas LDA uses an assumed statistical model and generative process. In educational measurement, LSA is often used in algorithmic scoring of essays due to its high reliability and agreement with human raters. LDA is often used as a supplemental analysis to gain additional information about students, such as their thinking and reasoning. This article reviews and compares the LSA and LDA topic models. This article also introduces a methodology for comparing the semantic spaces obtained by the two models and uses a simulation study to investigate their similarities.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"30 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139231033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sample Size Calculation and Optimal Design for Multivariate Regression-Based Norming 基于多元回归规范的样本量计算和优化设计

IF 2.4 3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2023-11-22 DOI: 10.3102/10769986231210807

Francesco Innocenti, M. Candel, Frans E. S. Tan, Gerard J. P. van Breukelen

{"title":"Sample Size Calculation and Optimal Design for Multivariate Regression-Based Norming","authors":"Francesco Innocenti, M. Candel, Frans E. S. Tan, Gerard J. P. van Breukelen","doi":"10.3102/10769986231210807","DOIUrl":"https://doi.org/10.3102/10769986231210807","url":null,"abstract":"Normative studies are needed to obtain norms for comparing individuals with the reference population on relevant clinical or educational measures. Norms can be obtained in an efficient way by regressing the test score on relevant predictors, such as age and sex. When several measures are normed with the same sample, a multivariate regression-based approach must be adopted for at least two reasons: (1) to take into account the correlations between the measures of the same subject, in order to test certain scientific hypotheses and to reduce misclassification of subjects in clinical practice, and (2) to reduce the number of significance tests involved in selecting predictors for the purpose of norming, thus preventing the inflation of the type I error rate. A new multivariate regression-based approach is proposed that combines all measures for an individual through the Mahalanobis distance, thus providing an indicator of the individual’s overall performance. Furthermore, optimal designs for the normative study are derived under five multivariate polynomial regression models, assuming multivariate normality and homoscedasticity of the residuals, and efficient robust designs are presented in case of uncertainty about the correct model for the analysis of the normative sample. Sample size calculation formulas are provided for the new Mahalanobis distance-based approach. The results are illustrated with data from the Maastricht Aging Study (MAAS).","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"106 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139249099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corrigendum to Power Approximations for Overall Average Effects in Meta-Analysis With Dependent Effect Sizes 有依赖效应大小的 Meta 分析中总体平均效应的功率近似值》的更正

IF 2.4 3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2023-11-17 DOI: 10.3102/10769986231207878

引用次数: 0

Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial 结合人类和自动评分方法在写作的实验评估:一个案例研究教程

3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2023-11-08 DOI: 10.3102/10769986231207886

Reagan Mozer, Luke Miratrix, Jackie Eunjung Relyea, James S. Kim

{"title":"Combining Human and Automated Scoring Methods in Experimental Assessments of Writing: A Case Study Tutorial","authors":"Reagan Mozer, Luke Miratrix, Jackie Eunjung Relyea, James S. Kim","doi":"10.3102/10769986231207886","DOIUrl":"https://doi.org/10.3102/10769986231207886","url":null,"abstract":"In a randomized trial that collects text as an outcome, traditional approaches for assessing treatment impact require that each document first be manually coded for constructs of interest by human raters. An impact analysis can then be conducted to compare treatment and control groups, using the hand-coded scores as a measured outcome. This process is both time and labor-intensive, which creates a persistent barrier for large-scale assessments of text. Furthermore, enriching one’s understanding of a found impact on text outcomes via secondary analyses can be difficult without additional scoring efforts. The purpose of this article is to provide a pipeline for using machine-based text analytic and data mining tools to augment traditional text-based impact analysis by analyzing impacts across an array of automatically generated text features. In this way, we can explore what an overall impact signifies in terms of how the text has evolved due to treatment. Through a case study based on a recent field trial in education, we show that machine learning can indeed enrich experimental evaluations of text by providing a more comprehensive and fine-grained picture of the mechanisms that lead to stronger argumentative writing in a first- and second-grade content literacy intervention. Relying exclusively on human scoring, by contrast, is a lost opportunity. Overall, the workflow and analytical strategy we describe can serve as a template for researchers interested in performing their own experimental evaluations of text.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"159 8‐10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135393035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Two-Level Adaptive Test Battery 双电平自适应测试电池

3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2023-11-06 DOI: 10.3102/10769986231209447

Wim J. van der Linden, Luping Niu, Seung W. Choi

引用次数: 0

Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model 多重测试数据分析:基于信息的IRT模型与广义部分信用模型的比较

3区心理学

Journal of Educational and Behavioral Statistics Pub Date : 2023-11-06 DOI: 10.3102/10769986231207879

Joakim Wallmark, James O. Ramsay, Juan Li, Marie Wiberg

{"title":"Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model","authors":"Joakim Wallmark, James O. Ramsay, Juan Li, Marie Wiberg","doi":"10.3102/10769986231207879","DOIUrl":"https://doi.org/10.3102/10769986231207879","url":null,"abstract":"Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"43 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135681661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0