Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.

IF 1.2 4区心理学 Q4 PSYCHOLOGY, MATHEMATICAL

Applied Psychological Measurement Pub Date : 2023-03-01 Epub Date: 2022-10-04 DOI:10.1177/01466216221124087

Waldir Leôncio, Marie Wiberg, Michela Battauz

{"title":"Evaluating Equating Transformations in IRT Observed-Score and Kernel Equating Methods.","authors":"Waldir Leôncio, Marie Wiberg, Michela Battauz","doi":"10.1177/01466216221124087","DOIUrl":null,"url":null,"abstract":"<p><p>Test equating is a statistical procedure to ensure that scores from different test forms can be used interchangeably. There are several methodologies available to perform equating, some of which are based on the Classical Test Theory (CTT) framework and others are based on the Item Response Theory (IRT) framework. This article compares equating transformations originated from three different frameworks, namely IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE). The comparisons were made under different data-generating scenarios, which include the development of a novel data-generation procedure that allows the simulation of test data without relying on IRT parameters while still providing control over some test score properties such as distribution skewness and item difficulty. Our results suggest that IRT methods tend to provide better results than KE even when the data are not generated from IRT processes. KE might be able to provide satisfactory results if a proper pre-smoothing solution can be found, while also being much faster than IRT methods. For daily applications, we recommend observing the sensibility of the results to the equating method, minding the importance of good model fit and meeting the assumptions of the framework.</p>","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":"47 2","pages":"123-140"},"PeriodicalIF":1.2000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/74/30/10.1177_01466216221124087.PMC9979196.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/01466216221124087","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/10/4 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"PSYCHOLOGY, MATHEMATICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Test equating is a statistical procedure to ensure that scores from different test forms can be used interchangeably. There are several methodologies available to perform equating, some of which are based on the Classical Test Theory (CTT) framework and others are based on the Item Response Theory (IRT) framework. This article compares equating transformations originated from three different frameworks, namely IRT Observed-Score Equating (IRTOSE), Kernel Equating (KE), and IRT Kernel Equating (IRTKE). The comparisons were made under different data-generating scenarios, which include the development of a novel data-generation procedure that allows the simulation of test data without relying on IRT parameters while still providing control over some test score properties such as distribution skewness and item difficulty. Our results suggest that IRT methods tend to provide better results than KE even when the data are not generated from IRT processes. KE might be able to provide satisfactory results if a proper pre-smoothing solution can be found, while also being much faster than IRT methods. For daily applications, we recommend observing the sensibility of the results to the equating method, minding the importance of good model fit and meeting the assumptions of the framework.

Abstract Image

查看原文本刊更多论文

评估 IRT 观察得分和核等价方法中的等价变换。

测验等化是一种统计程序，旨在确保不同测验形式的分数可以互换使用。有多种方法可用于等分，其中一些基于经典测验理论（CTT）框架，另一些则基于项目反应理论（IRT）框架。本文比较了源自三种不同框架的等分转换方法，即 IRT 观察得分等分法（IRTOSE）、核等分法（KE）和 IRT 核等分法（IRTKE）。比较是在不同的数据生成情景下进行的，其中包括开发一种新颖的数据生成程序，该程序允许在不依赖 IRT 参数的情况下模拟测试数据，同时还能控制某些测试得分属性，如分布偏度和项目难度。我们的结果表明，即使数据不是由 IRT 过程生成的，IRT 方法也往往能提供比 KE 更好的结果。如果能找到合适的预平滑方案，KE 也许能提供令人满意的结果，而且比 IRT 方法快得多。在日常应用中，我们建议观察结果对均衡方法的敏感性，同时注意良好的模型拟合和满足框架假设的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Psychological Measurement Multiple-

CiteScore

2.30

自引率

8.30%

发文量

期刊介绍： Applied Psychological Measurement publishes empirical research on the application of techniques of psychological measurement to substantive problems in all areas of psychology and related disciplines.