测试过程中参与度的变化:加权评分的争论?

IF 1.1 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education Pub Date : 2023-11-06 DOI:10.1080/08957347.2023.2274568

Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi

{"title":"测试过程中参与度的变化:加权评分的争论?","authors":"Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi","doi":"10.1080/08957347.2023.2274568","DOIUrl":null,"url":null,"abstract":"ABSTRACTRecent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance, lower-performing portions are less likely to reflect maximum performance than higher-performing portions. This empirical study explored the use of differential weighting of item responses during scoring, with weighting schemes representing either declining or increasing performance. Results indicated that weighted scoring could substantially decrease the score distortion due to disengagement factors and thereby improve test score validity. The study findings support the use of scoring procedures that manage disengagement by adapting to student test-taking behavior. Disclosure statementThe authors have no known conflicts of interest to disclose.Notes1 What constitutes “construct-irrelevant” depends on how the target construct is conceptualized. For example, Borgonovi and Biecek (Citation2016) argued that academic endurance should be considered part of what PISA is intended to measure, because academic endurance is positively associated with a student’s success later in life. It is unclear, however, how universally this conceptualization is adopted by those interpreting PISA results.2 Such comparisons between first and second half test performance require the assumption that the two halves are reasonably equivalent in terms of content representation if IRT-based scoring is used.3 Half test MLE standard errors in Math and Reading were around 4.2 and 4.8, respectively.4 These intervals are not intended to correspond to the critical regions used to assess statistical significance under the AMC method. For example, classifying PD < -10 points as a large decline represents a less conservative criterion than the critical region used by Wise and Kingsbury (Citation2022).","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"21 3 1","pages":"0"},"PeriodicalIF":1.1000,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Change in Engagement During Test Events: An Argument for Weighted Scoring?\",\"authors\":\"Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi\",\"doi\":\"10.1080/08957347.2023.2274568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACTRecent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance, lower-performing portions are less likely to reflect maximum performance than higher-performing portions. This empirical study explored the use of differential weighting of item responses during scoring, with weighting schemes representing either declining or increasing performance. Results indicated that weighted scoring could substantially decrease the score distortion due to disengagement factors and thereby improve test score validity. The study findings support the use of scoring procedures that manage disengagement by adapting to student test-taking behavior. Disclosure statementThe authors have no known conflicts of interest to disclose.Notes1 What constitutes “construct-irrelevant” depends on how the target construct is conceptualized. For example, Borgonovi and Biecek (Citation2016) argued that academic endurance should be considered part of what PISA is intended to measure, because academic endurance is positively associated with a student’s success later in life. It is unclear, however, how universally this conceptualization is adopted by those interpreting PISA results.2 Such comparisons between first and second half test performance require the assumption that the two halves are reasonably equivalent in terms of content representation if IRT-based scoring is used.3 Half test MLE standard errors in Math and Reading were around 4.2 and 4.8, respectively.4 These intervals are not intended to correspond to the critical regions used to assess statistical significance under the AMC method. For example, classifying PD < -10 points as a large decline represents a less conservative criterion than the critical region used by Wise and Kingsbury (Citation2022).\",\"PeriodicalId\":51609,\"journal\":{\"name\":\"Applied Measurement in Education\",\"volume\":\"21 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Measurement in Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/08957347.2023.2274568\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Measurement in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/08957347.2023.2274568","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

摘要最近的研究提供了证据，表明学生在考试过程中的表现变化可以表明考试脱离的存在。有意义的性能变化意味着测试事件的某些部分比其他部分更好地反映了假定的最大性能，并且，因为脱离会降低性能，较低性能的部分比较高性能的部分更不可能反映最大性能。本实证研究探讨了在评分过程中对项目反应的差异加权的使用，加权方案代表了绩效的下降或提高。结果表明，采用加权评分法可以有效地减少因脱离参与因素而造成的分数失真，从而提高测试分数的效度。研究结果支持使用评分程序，通过适应学生的考试行为来管理脱离。声明作者无已知利益冲突需要披露。注1“无关结构”的构成取决于目标结构是如何概念化的。例如，Borgonovi和Biecek (Citation2016)认为，学业耐力应该被视为PISA旨在衡量的一部分，因为学业耐力与学生以后的成功呈正相关。然而，目前尚不清楚，解释PISA结果的人是否普遍采用了这一概念如果使用基于irt的评分，对前半部分和后半部分测试性能的比较需要假设这两部分在内容表示方面是相当的数学和阅读的一半测试MLE标准误差分别在4.2和4.8左右这些区间不打算对应于在AMC方法下用于评估统计显著性的关键区域。例如，与Wise和Kingsbury (Citation2022)使用的临界区域相比，将PD < -10点分类为大幅下降的标准就不那么保守了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Change in Engagement During Test Events: An Argument for Weighted Scoring?

ABSTRACTRecent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance, lower-performing portions are less likely to reflect maximum performance than higher-performing portions. This empirical study explored the use of differential weighting of item responses during scoring, with weighting schemes representing either declining or increasing performance. Results indicated that weighted scoring could substantially decrease the score distortion due to disengagement factors and thereby improve test score validity. The study findings support the use of scoring procedures that manage disengagement by adapting to student test-taking behavior. Disclosure statementThe authors have no known conflicts of interest to disclose.Notes1 What constitutes “construct-irrelevant” depends on how the target construct is conceptualized. For example, Borgonovi and Biecek (Citation2016) argued that academic endurance should be considered part of what PISA is intended to measure, because academic endurance is positively associated with a student’s success later in life. It is unclear, however, how universally this conceptualization is adopted by those interpreting PISA results.2 Such comparisons between first and second half test performance require the assumption that the two halves are reasonably equivalent in terms of content representation if IRT-based scoring is used.3 Half test MLE standard errors in Math and Reading were around 4.2 and 4.8, respectively.4 These intervals are not intended to correspond to the critical regions used to assess statistical significance under the AMC method. For example, classifying PD < -10 points as a large decline represents a less conservative criterion than the critical region used by Wise and Kingsbury (Citation2022).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Measurement in Education Multiple-

CiteScore

2.50

自引率

13.30%

发文量

期刊介绍： Because interaction between the domains of research and application is critical to the evaluation and improvement of new educational measurement practices, Applied Measurement in Education" prime objective is to improve communication between academicians and practitioners. To help bridge the gap between theory and practice, articles in this journal describe original research studies, innovative strategies for solving educational measurement problems, and integrative reviews of current approaches to contemporary measurement issues. Peer Review Policy: All review papers in this journal have undergone editorial screening and peer review.