Volatility in School Test Scores: Implications for Test-Based Accountability Systems

Thomas J. Kane, D. Staiger
{"title":"Volatility in School Test Scores: Implications for Test-Based Accountability Systems","authors":"Thomas J. Kane, D. Staiger","doi":"10.1353/PEP.2002.0010","DOIUrl":null,"url":null,"abstract":"B y the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. For example, California planned to spend $677 million on teacher incentives in 2001, providing bonuses of up to $25,000 to teachers in schools with the largest test score gains. We highlight an under-appreciated weakness of school accountability systems—the volatility of test score measures—and explore the implications of that volatility for the design of school accountability systems. The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.","PeriodicalId":9272,"journal":{"name":"Brookings Papers on Education Policy","volume":"22 1","pages":"235 - 283"},"PeriodicalIF":0.0000,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"275","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brookings Papers on Education Policy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1353/PEP.2002.0010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 275

Abstract

B y the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. For example, California planned to spend $677 million on teacher incentives in 2001, providing bonuses of up to $25,000 to teachers in schools with the largest test score gains. We highlight an under-appreciated weakness of school accountability systems—the volatility of test score measures—and explore the implications of that volatility for the design of school accountability systems. The imprecision of test score measures arises from two sources. The first is sampling variation, which is a particularly striking problem in elementary schools. With the average elementary school containing only sixty-eight students per grade level, the amount of variation stemming from the idiosyncrasies of the particular sample of students being tested is often large relative to the total amount of variation observed between schools. The second arises from one-time factors that are not sensitive to the size of the sample; for example, a dog barking in the playground on the day of the test, a severe flu season, a disruptive student in a class, or favorable chemistry between a group of students and their teacher. Both small samples and other one-time factors can add considerable volatility to test score measures.
学校考试成绩的波动:对基于考试的问责制的影响
到2000年春天,已经有40个州开始使用学生的考试成绩来评价学校的表现。20个州更进一步,对学校的考试成绩进行明确的金钱奖励或制裁。例如,加州计划在2001年花费6.77亿美元用于教师激励,为考试成绩最高的学校的教师提供高达2.5万美元的奖金。我们强调了学校问责制的一个未被充分认识的弱点——考试成绩衡量的波动性——并探讨了这种波动性对学校问责制设计的影响。考试成绩衡量的不精确性来自两个方面。首先是抽样变异,这在小学是一个特别突出的问题。由于一所小学平均每个年级只有68名学生,因此,与学校之间观察到的总体差异相比,由被测试学生的特定样本的特质引起的差异量往往很大。第二种源于对样本大小不敏感的一次性因素;例如,考试当天操场上的狗在叫,严重的流感季节,课堂上捣乱的学生,或者一群学生和老师之间良好的化学反应。小样本和其他一次性因素都会增加测试分数测量的波动性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信