Evaluation of Item Fit With Output From the EM Algorithm: RMSD Index Based on Posterior Expectations.

IF 2.3 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement Pub Date : 2025-10-04 DOI:10.1177/00131644251369532

Yun-Kyung Kim, Li Cai, YoungKoung Kim

{"title":"Evaluation of Item Fit With Output From the EM Algorithm: RMSD Index Based on Posterior Expectations.","authors":"Yun-Kyung Kim, Li Cai, YoungKoung Kim","doi":"10.1177/00131644251369532","DOIUrl":null,"url":null,"abstract":"<p><p>In item response theory modeling, item fit analysis using posterior expectations, otherwise known as pseudocounts, has many advantages. They are readily obtained from the E-step output of the Bock-Aitkin Expectation-Maximization (EM) algorithm and continue to function as a basis of evaluating model fit, even when missing data are present. This paper aimed to improve the interpretability of the root mean squared deviation (RMSD) index based on posterior expectations. In Study 1, we assessed its performance using two approaches. First, we employed the poor person's posterior predictive model checking (PP-PPMC) to compute their significance levels. The resulting Type I error was generally controlled below the nominal level, but power noticeably declined with smaller sample sizes and shorter test lengths. Second, we used receiver operating characteristic (ROC) curve analysis (±) to empirically determine the reference values (cutoff thresholds) that achieve an optimal balance between false-positive and true-positive rates. Importantly, we identified optimal reference values for each combination of sample size and test length in the simulation conditions. The cutoff threshold approach outperformed the PP-PPMC approach with greater gains in true-positive rates than losses from the inflated false-positive rates. In Study 2, we extended the cutoff threshold approach to conditions with larger sample sizes and longer test lengths. Moreover, we evaluated the performance of the optimized cutoff thresholds under varying levels of data missingness. Finally, we employed response surface analysis (±) to develop a prediction model that generalizes the way the reference values vary with sample size and test length. Overall, this study demonstrates the application of the PP-PPMC for item fit diagnostics and implements a practical frequentist approach to empirically derive reference values. Using our prediction model, practitioners can compute the reference values of RMSD that are tailored to their dataset's sample size and test length.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251369532"},"PeriodicalIF":2.3000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12496452/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational and Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/00131644251369532","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In item response theory modeling, item fit analysis using posterior expectations, otherwise known as pseudocounts, has many advantages. They are readily obtained from the E-step output of the Bock-Aitkin Expectation-Maximization (EM) algorithm and continue to function as a basis of evaluating model fit, even when missing data are present. This paper aimed to improve the interpretability of the root mean squared deviation (RMSD) index based on posterior expectations. In Study 1, we assessed its performance using two approaches. First, we employed the poor person's posterior predictive model checking (PP-PPMC) to compute their significance levels. The resulting Type I error was generally controlled below the nominal level, but power noticeably declined with smaller sample sizes and shorter test lengths. Second, we used receiver operating characteristic (ROC) curve analysis (±) to empirically determine the reference values (cutoff thresholds) that achieve an optimal balance between false-positive and true-positive rates. Importantly, we identified optimal reference values for each combination of sample size and test length in the simulation conditions. The cutoff threshold approach outperformed the PP-PPMC approach with greater gains in true-positive rates than losses from the inflated false-positive rates. In Study 2, we extended the cutoff threshold approach to conditions with larger sample sizes and longer test lengths. Moreover, we evaluated the performance of the optimized cutoff thresholds under varying levels of data missingness. Finally, we employed response surface analysis (±) to develop a prediction model that generalizes the way the reference values vary with sample size and test length. Overall, this study demonstrates the application of the PP-PPMC for item fit diagnostics and implements a practical frequentist approach to empirically derive reference values. Using our prediction model, practitioners can compute the reference values of RMSD that are tailored to their dataset's sample size and test length.

查看原文本刊更多论文

EM算法输出的项目拟合评价：基于后验期望的RMSD指数。

在项目反应理论建模中，项目拟合分析使用后验期望，或称为伪计数，有许多优点。它们很容易从Bock-Aitkin期望最大化（EM）算法的e步输出中获得，并且即使存在缺失数据，也可以继续作为评估模型拟合的基础。本文旨在提高基于后验期望的均方根偏差（RMSD）指数的可解释性。在研究1中，我们使用两种方法评估其性能。首先，我们采用穷人的后验预测模型检验（PP-PPMC）来计算其显著性水平。由此产生的I型误差通常被控制在标称水平以下，但随着样本量的减少和测试长度的缩短，功率明显下降。其次，我们使用受试者工作特征（ROC）曲线分析（±）来经验确定在假阳性率和真阳性率之间实现最佳平衡的参考值（截止阈值）。重要的是，我们确定了模拟条件下每种样本量和测试长度组合的最佳参考值。截止阈值法比PP-PPMC法表现更好，在真阳性率方面的收益大于假阳性率膨胀带来的损失。在研究2中，我们将截止阈值方法扩展到样本量更大、测试长度更长的条件下。此外，我们评估了优化的截止阈值在不同数据缺失水平下的性能。最后，我们采用响应面分析（±）建立了一个预测模型，该模型概括了参考值随样本量和试验长度的变化方式。总体而言，本研究展示了PP-PPMC在项目拟合诊断中的应用，并实现了一种实用的频率学方法来经验推导参考值。使用我们的预测模型，从业者可以根据他们的数据集的样本大小和测试长度来计算RMSD的参考值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Educational and Psychological Measurement 医学-数学跨学科应用

CiteScore

5.50

自引率

7.40%

发文量

审稿时长

6-12 weeks

期刊介绍： Educational and Psychological Measurement (EPM) publishes referred scholarly work from all academic disciplines interested in the study of measurement theory, problems, and issues. Theoretical articles address new developments and techniques, and applied articles deal with innovation applications.