反应时间阈值程序的选择是否会对有关识别和排除快速猜测反应的推论产生重大影响?荟萃分析

IF 2.6 Q1 EDUCATION & EDUCATIONAL RESEARCH
Rios, Joseph A., Deng, Jiayi
{"title":"反应时间阈值程序的选择是否会对有关识别和排除快速猜测反应的推论产生重大影响?荟萃分析","authors":"Rios, Joseph A., Deng, Jiayi","doi":"10.1186/s40536-021-00110-8","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>In testing contexts that are predominately concerned with power, rapid guessing (RG) has the potential to undermine the validity of inferences made from educational assessments, as such responses are unreflective of the knowledge, skills, and abilities assessed. Given this concern, practitioners/researchers have utilized a multitude of response time threshold procedures that classify RG responses in these contexts based on either the use of no empirical data (e.g., an arbitrary time limit), response time distributions, and the combination of response time and accuracy information. As there is little understanding of how these procedures compare to each other, this meta-analysis sought to investigate whether threshold typology is related to differences in descriptive, measurement property, and performance outcomes in these contexts.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Studies were sampled that: (a) employed two or more response time (RT) threshold procedures to identify and exclude RG responses on the same computer-administered low-stakes power test; and (b) evaluated differences between procedures on the proportion of RG responses and responders, measurement properties, and test performance.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>Based on as many as 86 effect sizes, our findings indicated non-negligible differences between RT threshold procedures in the proportion of RG responses and responders. The largest differences for these outcomes were observed between procedures using no empirical data and those relying on response time and accuracy information. However, these differences were not related to variability in aggregate-level measurement properties and test performance.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>When filtering RG responses to improve inferences concerning item properties and group score outcomes, the actual threshold procedure chosen may be of less importance than the act of identifying such deleterious responses. However, given the conservative nature of RT thresholds that use no empirical data, practitioners may look to avoid the use of these procedures when making inferences at the individual-level, given their potential for underclassifying RG.</p>","PeriodicalId":37009,"journal":{"name":"Large-Scale Assessments in Education","volume":"27 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Does the choice of response time threshold procedure substantially affect inferences concerning the identification and exclusion of rapid guessing responses? A meta-analysis\",\"authors\":\"Rios, Joseph A., Deng, Jiayi\",\"doi\":\"10.1186/s40536-021-00110-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3 data-test=\\\"abstract-sub-heading\\\">Background</h3><p>In testing contexts that are predominately concerned with power, rapid guessing (RG) has the potential to undermine the validity of inferences made from educational assessments, as such responses are unreflective of the knowledge, skills, and abilities assessed. Given this concern, practitioners/researchers have utilized a multitude of response time threshold procedures that classify RG responses in these contexts based on either the use of no empirical data (e.g., an arbitrary time limit), response time distributions, and the combination of response time and accuracy information. As there is little understanding of how these procedures compare to each other, this meta-analysis sought to investigate whether threshold typology is related to differences in descriptive, measurement property, and performance outcomes in these contexts.</p><h3 data-test=\\\"abstract-sub-heading\\\">Methods</h3><p>Studies were sampled that: (a) employed two or more response time (RT) threshold procedures to identify and exclude RG responses on the same computer-administered low-stakes power test; and (b) evaluated differences between procedures on the proportion of RG responses and responders, measurement properties, and test performance.</p><h3 data-test=\\\"abstract-sub-heading\\\">Results</h3><p>Based on as many as 86 effect sizes, our findings indicated non-negligible differences between RT threshold procedures in the proportion of RG responses and responders. The largest differences for these outcomes were observed between procedures using no empirical data and those relying on response time and accuracy information. However, these differences were not related to variability in aggregate-level measurement properties and test performance.</p><h3 data-test=\\\"abstract-sub-heading\\\">Conclusions</h3><p>When filtering RG responses to improve inferences concerning item properties and group score outcomes, the actual threshold procedure chosen may be of less importance than the act of identifying such deleterious responses. However, given the conservative nature of RT thresholds that use no empirical data, practitioners may look to avoid the use of these procedures when making inferences at the individual-level, given their potential for underclassifying RG.</p>\",\"PeriodicalId\":37009,\"journal\":{\"name\":\"Large-Scale Assessments in Education\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2021-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Large-Scale Assessments in Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40536-021-00110-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Large-Scale Assessments in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40536-021-00110-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0

摘要

背景在主要关注权力的测试背景下,快速猜测(RG)有可能会破坏教育评估所做推论的有效性,因为这种反应不能反映所评估的知识、技能和能力。有鉴于此,实践者/研究者们采用了多种反应时间阈值程序,根据无经验数据(如任意的时间限制)、反应时间分布以及反应时间与准确性信息的结合,对这些情况下的 RG 反应进行分类。由于对这些程序如何相互比较了解甚少,本荟萃分析试图研究阈值类型学是否与这些情境中的描述性、测量属性和绩效结果的差异有关:(结果基于多达 86 个效应大小,我们的研究结果表明,在 RG 反应和反应者比例方面,RT 门槛程序之间存在不可忽略的差异。在这些结果上,没有使用经验数据的程序与依赖反应时间和准确性信息的程序之间的差异最大。结论当过滤 RG 反应以改进有关项目属性和组得分结果的推断时,所选择的实际阈值程序可能不如识别此类有害反应的行为重要。然而,鉴于不使用经验数据的实时阈值的保守性,从业人员在进行个人层面的推断时可能会避免使用这些程序,因为它们有可能将 RG 分类不足。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Does the choice of response time threshold procedure substantially affect inferences concerning the identification and exclusion of rapid guessing responses? A meta-analysis

Background

In testing contexts that are predominately concerned with power, rapid guessing (RG) has the potential to undermine the validity of inferences made from educational assessments, as such responses are unreflective of the knowledge, skills, and abilities assessed. Given this concern, practitioners/researchers have utilized a multitude of response time threshold procedures that classify RG responses in these contexts based on either the use of no empirical data (e.g., an arbitrary time limit), response time distributions, and the combination of response time and accuracy information. As there is little understanding of how these procedures compare to each other, this meta-analysis sought to investigate whether threshold typology is related to differences in descriptive, measurement property, and performance outcomes in these contexts.

Methods

Studies were sampled that: (a) employed two or more response time (RT) threshold procedures to identify and exclude RG responses on the same computer-administered low-stakes power test; and (b) evaluated differences between procedures on the proportion of RG responses and responders, measurement properties, and test performance.

Results

Based on as many as 86 effect sizes, our findings indicated non-negligible differences between RT threshold procedures in the proportion of RG responses and responders. The largest differences for these outcomes were observed between procedures using no empirical data and those relying on response time and accuracy information. However, these differences were not related to variability in aggregate-level measurement properties and test performance.

Conclusions

When filtering RG responses to improve inferences concerning item properties and group score outcomes, the actual threshold procedure chosen may be of less importance than the act of identifying such deleterious responses. However, given the conservative nature of RT thresholds that use no empirical data, practitioners may look to avoid the use of these procedures when making inferences at the individual-level, given their potential for underclassifying RG.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Large-Scale Assessments in Education
Large-Scale Assessments in Education Social Sciences-Education
CiteScore
4.30
自引率
6.50%
发文量
16
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信