Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi
{"title":"Change in Engagement During Test Events: An Argument for Weighted Scoring?","authors":"Steven L. Wise, G. Gage Kingsbury, Meredith L. Langi","doi":"10.1080/08957347.2023.2274568","DOIUrl":null,"url":null,"abstract":"ABSTRACTRecent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance, lower-performing portions are less likely to reflect maximum performance than higher-performing portions. This empirical study explored the use of differential weighting of item responses during scoring, with weighting schemes representing either declining or increasing performance. Results indicated that weighted scoring could substantially decrease the score distortion due to disengagement factors and thereby improve test score validity. The study findings support the use of scoring procedures that manage disengagement by adapting to student test-taking behavior. Disclosure statementThe authors have no known conflicts of interest to disclose.Notes1 What constitutes “construct-irrelevant” depends on how the target construct is conceptualized. For example, Borgonovi and Biecek (Citation2016) argued that academic endurance should be considered part of what PISA is intended to measure, because academic endurance is positively associated with a student’s success later in life. It is unclear, however, how universally this conceptualization is adopted by those interpreting PISA results.2 Such comparisons between first and second half test performance require the assumption that the two halves are reasonably equivalent in terms of content representation if IRT-based scoring is used.3 Half test MLE standard errors in Math and Reading were around 4.2 and 4.8, respectively.4 These intervals are not intended to correspond to the critical regions used to assess statistical significance under the AMC method. For example, classifying PD < -10 points as a large decline represents a less conservative criterion than the critical region used by Wise and Kingsbury (Citation2022).","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"21 3 1","pages":"0"},"PeriodicalIF":1.1000,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Measurement in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/08957347.2023.2274568","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACTRecent research has provided evidence that performance change during a student’s test event can indicate the presence of test-taking disengagement. Meaningful performance change implies that some portions of the test event reflect assumed maximum performance better than others and, because disengagement tends to diminish performance, lower-performing portions are less likely to reflect maximum performance than higher-performing portions. This empirical study explored the use of differential weighting of item responses during scoring, with weighting schemes representing either declining or increasing performance. Results indicated that weighted scoring could substantially decrease the score distortion due to disengagement factors and thereby improve test score validity. The study findings support the use of scoring procedures that manage disengagement by adapting to student test-taking behavior. Disclosure statementThe authors have no known conflicts of interest to disclose.Notes1 What constitutes “construct-irrelevant” depends on how the target construct is conceptualized. For example, Borgonovi and Biecek (Citation2016) argued that academic endurance should be considered part of what PISA is intended to measure, because academic endurance is positively associated with a student’s success later in life. It is unclear, however, how universally this conceptualization is adopted by those interpreting PISA results.2 Such comparisons between first and second half test performance require the assumption that the two halves are reasonably equivalent in terms of content representation if IRT-based scoring is used.3 Half test MLE standard errors in Math and Reading were around 4.2 and 4.8, respectively.4 These intervals are not intended to correspond to the critical regions used to assess statistical significance under the AMC method. For example, classifying PD < -10 points as a large decline represents a less conservative criterion than the critical region used by Wise and Kingsbury (Citation2022).
期刊介绍:
Because interaction between the domains of research and application is critical to the evaluation and improvement of new educational measurement practices, Applied Measurement in Education" prime objective is to improve communication between academicians and practitioners. To help bridge the gap between theory and practice, articles in this journal describe original research studies, innovative strategies for solving educational measurement problems, and integrative reviews of current approaches to contemporary measurement issues. Peer Review Policy: All review papers in this journal have undergone editorial screening and peer review.