Model Misspecification and Robustness of Observed-Score Test Equating Using Propensity Scores

IF 1.7 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics Pub Date : 2023-05-09 DOI:10.3102/10769986231161575

G. Wallin, M. Wiberg

{"title":"Model Misspecification and Robustness of Observed-Score Test Equating Using Propensity Scores","authors":"G. Wallin, M. Wiberg","doi":"10.3102/10769986231161575","DOIUrl":null,"url":null,"abstract":"This study explores the usefulness of covariates on equating test scores from nonequivalent test groups. The covariates are captured by an estimated propensity score, which is used as a proxy for latent ability to balance the test groups. The objective is to assess the sensitivity of the equated scores to various misspecifications in the propensity score model. The study assumes a parametric form of the propensity score and evaluates the effects of various misspecification scenarios on equating error. The results, based on both simulated and real testing data, show that (1) omitting an important covariate leads to biased estimates of the equated scores, (2) misspecifying a nonlinear relationship between the covariates and test scores increases the equating standard error in the tails of the score distributions, and (3) the equating estimators are robust against omitting a second-order term as well as using an incorrect link function in the propensity score estimation model. The findings demonstrate that auxiliary information is beneficial for test score equating in complex settings. However, it also sheds light on the challenge of making fair comparisons between nonequivalent test groups in the absence of common items. The study identifies scenarios, where equating performance is acceptable and problematic, provides practical guidelines, and identifies areas for further investigation.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"603 - 635"},"PeriodicalIF":1.7000,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational and Behavioral Statistics","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3102/10769986231161575","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

This study explores the usefulness of covariates on equating test scores from nonequivalent test groups. The covariates are captured by an estimated propensity score, which is used as a proxy for latent ability to balance the test groups. The objective is to assess the sensitivity of the equated scores to various misspecifications in the propensity score model. The study assumes a parametric form of the propensity score and evaluates the effects of various misspecification scenarios on equating error. The results, based on both simulated and real testing data, show that (1) omitting an important covariate leads to biased estimates of the equated scores, (2) misspecifying a nonlinear relationship between the covariates and test scores increases the equating standard error in the tails of the score distributions, and (3) the equating estimators are robust against omitting a second-order term as well as using an incorrect link function in the propensity score estimation model. The findings demonstrate that auxiliary information is beneficial for test score equating in complex settings. However, it also sheds light on the challenge of making fair comparisons between nonequivalent test groups in the absence of common items. The study identifies scenarios, where equating performance is acceptable and problematic, provides practical guidelines, and identifies areas for further investigation.

查看原文本刊更多论文

使用倾向性得分进行观察得分测试等式的模型不精确性和稳健性

本研究探讨了协变量在等价非等价测试组的测试分数方面的有用性。协变量由估计的倾向得分捕获，该得分被用作平衡测试组的潜在能力的代理。目的是评估等同分数对倾向分数模型中各种错误指定的敏感性。该研究假设了倾向评分的参数形式，并评估了各种错误指定场景对等式错误的影响。基于模拟和真实测试数据的结果表明，（1）省略一个重要的协变量会导致对等值分数的有偏估计，（2）错误指定协变量和测试分数之间的非线性关系会增加分数分布尾部的等值标准误差，以及（3）等式估计器对于在倾向得分估计模型中省略二阶项以及使用不正确的链接函数是鲁棒的。研究结果表明，在复杂的环境中，辅助信息有利于考试成绩的等值。然而，它也揭示了在缺乏共同项目的情况下，在非等价测试组之间进行公平比较的挑战。该研究确定了可接受和有问题的情况，提供了实用的指导方针，并确定了需要进一步调查的领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Educational and Behavioral Statistics Multiple-

CiteScore

4.40

自引率

4.20%

发文量

期刊介绍： Journal of Educational and Behavioral Statistics, sponsored jointly by the American Educational Research Association and the American Statistical Association, publishes articles that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also of interest. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority. The Journal of Educational and Behavioral Statistics provides an outlet for papers that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis, provide properties of these methods, and an example of use in education or behavioral research. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also sometimes accepted. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority.