Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model

IF 1.9 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics Pub Date : 2023-11-06 DOI:10.3102/10769986231207879

Joakim Wallmark, James O. Ramsay, Juan Li, Marie Wiberg

{"title":"Analyzing Polytomous Test Data: A Comparison Between an Information-Based IRT Model and the Generalized Partial Credit Model","authors":"Joakim Wallmark, James O. Ramsay, Juan Li, Marie Wiberg","doi":"10.3102/10769986231207879","DOIUrl":null,"url":null,"abstract":"Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"43 19","pages":"0"},"PeriodicalIF":1.9000,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational and Behavioral Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3102/10769986231207879","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Item response theory (IRT) models the relationship between the possible scores on a test item against a test taker’s attainment of the latent trait that the item is intended to measure. In this study, we compare two models for tests with polytomously scored items: the optimal scoring (OS) model, a nonparametric IRT model based on the principles of information theory, and the generalized partial credit (GPC) model, a widely used parametric alternative. We evaluate these models using both simulated and real test data. In the real data examples, the OS model demonstrates superior model fit compared to the GPC model across all analyzed datasets. In our simulation study, the OS model outperforms the GPC model in terms of bias, but at the cost of larger standard errors for the probabilities along the estimated item response functions. Furthermore, we illustrate how surprisal arc length, an IRT scale invariant measure of ability with metric properties, can be used to put scores from vastly different types of IRT models on a common scale. We also demonstrate how arc length can be a viable alternative to sum scores for scoring test takers.

查看原文本刊更多论文

多重测试数据分析:基于信息的IRT模型与广义部分信用模型的比较

项目反应理论(IRT)建立了一个测试项目的可能得分与被试者对该项目所要测量的潜在特质的实现之间的关系模型。在这项研究中，我们比较了两种具有多分项的测试模型:最优评分(OS)模型，一种基于信息论原理的非参数IRT模型，和广义部分信用(GPC)模型，一种广泛使用的参数替代模型。我们使用模拟和真实的测试数据来评估这些模型。在实际数据示例中，在所有分析的数据集上，与GPC模型相比，OS模型显示出更好的模型拟合。在我们的模拟研究中，OS模型在偏差方面优于GPC模型，但代价是沿估计的项目响应函数的概率有较大的标准误差。此外，我们说明了惊奇弧长，一个具有度量属性的IRT尺度不变的能力度量，可以用来将来自不同类型的IRT模型的分数放在一个共同的尺度上。我们还演示了弧长如何可以是一个可行的替代得分的总和考生得分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Educational and Behavioral Statistics Multiple-

CiteScore

4.40

自引率

4.20%

发文量

期刊介绍： Journal of Educational and Behavioral Statistics, sponsored jointly by the American Educational Research Association and the American Statistical Association, publishes articles that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also of interest. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority. The Journal of Educational and Behavioral Statistics provides an outlet for papers that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis, provide properties of these methods, and an example of use in education or behavioral research. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also sometimes accepted. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority.