Validation Methods for Aggregate-Level Test Scale Linking: A Rejoinder

IF 1.9 3区心理学 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Educational and Behavioral Statistics Pub Date : 2021-04-01 DOI:10.3102/1076998621994540

Andrew D. Ho, Sean F. Reardon, Demetra Kalogrides

{"title":"Validation Methods for Aggregate-Level Test Scale Linking: A Rejoinder","authors":"Andrew D. Ho, Sean F. Reardon, Demetra Kalogrides","doi":"10.3102/1076998621994540","DOIUrl":null,"url":null,"abstract":"In this issue, Reardon, Kalogrides, and Ho developed precision-adjusted random effects models to estimate aggregate-level linking error, for populations and subpopulations, for averages and progress over time. We are grateful to past editor Dan McCaffrey for selecting our paper as the focal article for a set of commentaries from our colleagues Daniel Bolt, Mark Davison, Alina von Davier, Tim Moses, and Neil Dorans. These commentaries reinforce important cautions and identify promising directions for future research. In this rejoinder, we clarify aspects of our originally proposed method. (1) Validation methods provide evidence of benefits and risks that different experts may weigh differently for different purposes. (2) Our proposed method differs from “standard mapping” procedures using the National Assessment of Educational Progress not only by using a linear (vs. equipercentile) link but also by targeting direct validity evidence about counterfactual aggregate scores. (3) Multilevel approaches that assume common score scales across states are indeed a promising next step for validation, and we hope that states enable researchers to use more of their common-core-era consortium test data for this purpose. Finally, we apply our linking method to an extended panel of data from 2009 to 2017 to show that linking recovery has remained stable.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"46 1","pages":"209 - 218"},"PeriodicalIF":1.9000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational and Behavioral Statistics","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.3102/1076998621994540","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 1

Abstract

In this issue, Reardon, Kalogrides, and Ho developed precision-adjusted random effects models to estimate aggregate-level linking error, for populations and subpopulations, for averages and progress over time. We are grateful to past editor Dan McCaffrey for selecting our paper as the focal article for a set of commentaries from our colleagues Daniel Bolt, Mark Davison, Alina von Davier, Tim Moses, and Neil Dorans. These commentaries reinforce important cautions and identify promising directions for future research. In this rejoinder, we clarify aspects of our originally proposed method. (1) Validation methods provide evidence of benefits and risks that different experts may weigh differently for different purposes. (2) Our proposed method differs from “standard mapping” procedures using the National Assessment of Educational Progress not only by using a linear (vs. equipercentile) link but also by targeting direct validity evidence about counterfactual aggregate scores. (3) Multilevel approaches that assume common score scales across states are indeed a promising next step for validation, and we hope that states enable researchers to use more of their common-core-era consortium test data for this purpose. Finally, we apply our linking method to an extended panel of data from 2009 to 2017 to show that linking recovery has remained stable.

查看原文本刊更多论文

聚合级测试量表链接的验证方法:一个反驳

在本期中，Reardon、Kalogrides和Ho开发了精度调整的随机效应模型，以估计种群和亚种群的总体水平连接误差，以及随时间的平均值和进展。我们感谢前任编辑Dan McCaffrey选择我们的论文作为我们同事Daniel Bolt、Mark Davison、Alina von Davier、Tim Moses和Neil Dorans的一系列评论的焦点文章。这些评论加强了重要的注意事项，并为未来的研究指明了有希望的方向。在这篇反驳中，我们澄清了我们最初提出的方法的各个方面。（1）验证方法提供了利益和风险的证据，不同的专家可能会出于不同的目的对其进行不同的权衡。（2）我们提出的方法与使用国家教育进步评估的“标准映射”程序的不同之处不仅在于使用线性（与等百分比）链接，还在于针对反事实总分的直接有效性证据。（3）假设各州的评分标准相同的多层次方法确实是下一步验证的好方法，我们希望各州能够让研究人员为此目的使用更多的共同核心时代联盟测试数据。最后，我们将我们的链接方法应用于2009年至2017年的一组扩展数据，以表明链接恢复保持稳定。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Educational and Behavioral Statistics Multiple-

CiteScore

4.40

自引率

4.20%

发文量

期刊介绍： Journal of Educational and Behavioral Statistics, sponsored jointly by the American Educational Research Association and the American Statistical Association, publishes articles that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also of interest. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority. The Journal of Educational and Behavioral Statistics provides an outlet for papers that are original and provide methods that are useful to those studying problems and issues in educational or behavioral research. Typical papers introduce new methods of analysis, provide properties of these methods, and an example of use in education or behavioral research. Critical reviews of current practice, tutorial presentations of less well known methods, and novel applications of already-known methods are also sometimes accepted. Papers discussing statistical techniques without specific educational or behavioral interest or focusing on substantive results without developing new statistical methods or models or making novel use of existing methods have lower priority. Simulation studies, either to demonstrate properties of an existing method or to compare several existing methods (without providing a new method), also have low priority.