评价锚项目选择中的数量-质量权衡:一种垂直尺度方法

Q2 Social Sciences

Practical Assessment, Research and Evaluation Pub Date : 2011-04-01 DOI:10.7275/NNCY-EW26

Florian Pibal, H. Cesnik

{"title":"评价锚项目选择中的数量-质量权衡:一种垂直尺度方法","authors":"Florian Pibal, H. Cesnik","doi":"10.7275/NNCY-EW26","DOIUrl":null,"url":null,"abstract":"When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":"64 1","pages":"6"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Evaluating the Quantity-Quality Trade-off in the Selection of Anchor Items: a Vertical Scaling Approach\",\"authors\":\"Florian Pibal, H. Cesnik\",\"doi\":\"10.7275/NNCY-EW26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.\",\"PeriodicalId\":20361,\"journal\":{\"name\":\"Practical Assessment, Research and Evaluation\",\"volume\":\"64 1\",\"pages\":\"6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Practical Assessment, Research and Evaluation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.7275/NNCY-EW26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Practical Assessment, Research and Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7275/NNCY-EW26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 4

摘要

在管理跨年级的考试时，通常采用垂直缩放法将不同考试的分数放在一个共同的总体尺度上，以便跟踪考生的进步。然而，为了能够将不同年级的结果联系起来，需要在两种测试表格中包含共同的项目。在文献中，对于常见物品的理想数量似乎没有明确的共识。与一些学者一致，我们认为更多的锚项目承担更高的风险，如位移，项目漂移或不希望的拟合统计，并且拥有更少的心理测量功能良好的锚项目有时可能更可取。为了证明这一点，进行了一项研究，包括对6至8年级的1350名考生进行阅读理解测试。在采用循序渐进的方法时，我们发现考试管理中跨年级高项目漂移的悖论可以得到缓解，甚至最终被消除。同时，积极的副作用是增加了经验数据的解释力。此外，研究发现，尺度调整可用于评估垂直尺度方法的有效性，在某些情况下，可以比使用校准的锚定项目产生更准确的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating the Quantity-Quality Trade-off in the Selection of Anchor Items: a Vertical Scaling Approach

When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Practical Assessment, Research and Evaluation Social Sciences-Education

CiteScore

2.60

自引率

0.00%

发文量