Accuracy of progress monitoring decision rules to evaluate response to instruction with two computer adaptive tests

IF 4.1 1区心理学 Q1 PSYCHOLOGY, SOCIAL

Journal of School Psychology Pub Date : 2024-05-14 DOI:10.1016/j.jsp.2024.101319

Ethan R. Van Norman, Emily R. Forcht

{"title":"Accuracy of progress monitoring decision rules to evaluate response to instruction with two computer adaptive tests","authors":"Ethan R. Van Norman, Emily R. Forcht","doi":"10.1016/j.jsp.2024.101319","DOIUrl":null,"url":null,"abstract":"<div><p>Computer adaptive tests have become popular assessments to screen students for academic risk. Research is emerging regarding their use as progress monitoring tools to measure response to instruction. We evaluated the accuracy of the trend-line decision rule when applied to outcomes from a frequently used reading computer adaptive test (i.e., Star Reading [SR]) and frequently used math computer adaptive test (i.e., Star Math [SM]). Analyses of extant SR and SM data were conducted to inform conditions for simulations to determine the number of assessments required to yield sufficient sensitivity (i.e., probability of recommending an instructional change when a change was warranted) and specificity (i.e., probability of recommending maintaining an intervention when a change was not warranted) when comparing performance to goal lines based upon a future target score (i.e., benchmark) as well as normative comparisons (50th and 75th percentiles). The extant dataset of SR outcomes consisted of monthly progress monitoring data from 993 Grade 3, 804 Grade 4, and 709 Grade 5 students from multiple states in the United States northwest. Data for SM were also drawn from the northwest and contained outcomes from 518 Grade 3, 474 Grade 4, and 391 Grade 5 students. Grade level samples were predominately White (range = 59.89%–67.72%) followed by Latinx (range = 9.65%–15.94%). Results of simulations suggest that when data were collected once a month, seven, eight, and nine observations were required to support low-stakes decisions with SR for Grades 3, 4, and 5, respectively. For SM, nine, ten, and eight observations were required for Grades, 3, 4, and 5, respectively. Given the length of time required to support reasonably accurate decisions, recommendations to consider other types of assessments and decision-making frameworks for academic progress monitoring are provided.</p></div>","PeriodicalId":48232,"journal":{"name":"Journal of School Psychology","volume":"105 ","pages":"Article 101319"},"PeriodicalIF":4.1000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of School Psychology","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022440524000396","RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, SOCIAL","Score":null,"Total":0}

引用次数: 0

Abstract

Computer adaptive tests have become popular assessments to screen students for academic risk. Research is emerging regarding their use as progress monitoring tools to measure response to instruction. We evaluated the accuracy of the trend-line decision rule when applied to outcomes from a frequently used reading computer adaptive test (i.e., Star Reading [SR]) and frequently used math computer adaptive test (i.e., Star Math [SM]). Analyses of extant SR and SM data were conducted to inform conditions for simulations to determine the number of assessments required to yield sufficient sensitivity (i.e., probability of recommending an instructional change when a change was warranted) and specificity (i.e., probability of recommending maintaining an intervention when a change was not warranted) when comparing performance to goal lines based upon a future target score (i.e., benchmark) as well as normative comparisons (50th and 75th percentiles). The extant dataset of SR outcomes consisted of monthly progress monitoring data from 993 Grade 3, 804 Grade 4, and 709 Grade 5 students from multiple states in the United States northwest. Data for SM were also drawn from the northwest and contained outcomes from 518 Grade 3, 474 Grade 4, and 391 Grade 5 students. Grade level samples were predominately White (range = 59.89%–67.72%) followed by Latinx (range = 9.65%–15.94%). Results of simulations suggest that when data were collected once a month, seven, eight, and nine observations were required to support low-stakes decisions with SR for Grades 3, 4, and 5, respectively. For SM, nine, ten, and eight observations were required for Grades, 3, 4, and 5, respectively. Given the length of time required to support reasonably accurate decisions, recommendations to consider other types of assessments and decision-making frameworks for academic progress monitoring are provided.

查看原文本刊更多论文

使用两种计算机自适应测试评估教学反应的进度监测决策规则的准确性

计算机自适应测试已成为筛查学生学业风险的流行评估方法。有关将计算机自适应测试作为进度监测工具来衡量教学反应的研究也在不断涌现。我们评估了趋势线判定规则应用于常用阅读计算机自适应测试（即 "Star Reading"[SR]）和常用数学计算机自适应测试（即 "Star Math"[SM]）结果时的准确性。对现有的 SR 和 SM 数据进行了分析，以确定模拟条件，从而确定在根据未来目标分数（即基准）以及常模比较（第 50 和第 75 百分位数）将成绩与目标线进行比较时，需要多少次评估才能产生足够的灵敏度（即在需要改变时建议改变教学的概率）和特异性（即在不需要改变时建议维持干预的概率）。现有的 SR 成果数据集包括来自美国西北部多个州的 993 名三年级、804 名四年级和 709 名五年级学生的每月进度监测数据。SM的数据也来自美国西北部，包括518名三年级学生、474名四年级学生和391名五年级学生的成绩。年级样本主要是白人（范围=59.89%-67.72%），其次是拉丁裔（范围=9.65%-15.94%）。模拟结果表明，在每月收集一次数据的情况下，三年级、四年级和五年级分别需要七次、八次和九次观察来支持SR的低风险决策。对于 SM，三年级、四年级和五年级分别需要 9 次、10 次和 8 次观察。鉴于支持合理准确的决定所需的时间较长，建议考虑其他类型的评估和学业进展监测决策框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of School Psychology PSYCHOLOGY, EDUCATIONAL-

CiteScore

6.70

自引率

8.00%

发文量

期刊介绍： The Journal of School Psychology publishes original empirical articles and critical reviews of the literature on research and practices relevant to psychological and behavioral processes in school settings. JSP presents research on intervention mechanisms and approaches; schooling effects on the development of social, cognitive, mental-health, and achievement-related outcomes; assessment; and consultation. Submissions from a variety of disciplines are encouraged. All manuscripts are read by the Editor and one or more editorial consultants with the intent of providing appropriate and constructive written reviews.