Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience

IF 2.6 2区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Studies in Educational Evaluation Pub Date : 2025-01-16 DOI:10.1016/j.stueduc.2024.101443

Filip Moons , Ellen Vandervieren , Jozef Colpaert

{"title":"Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience","authors":"Filip Moons , Ellen Vandervieren , Jozef Colpaert","doi":"10.1016/j.stueduc.2024.101443","DOIUrl":null,"url":null,"abstract":"<div><div>Assessing exams with multiple assessors is challenging regarding inter-rater reliability and feedback. This paper presents ‘checkbox grading,’ a digital method where exam designers have predefined checkboxes with both feedback and associated partial grades. Assessors then tick the checkboxes relevant to a student solution. Dependencies between checkboxes ensure consistency among assessors in following the grading scheme. Moreover, the approach supports ‘blind grading’ by hiding the grades associated with the checkboxes, thus focusing assessors on the criteria rather than the scores. The approach was studied during a large-scale mathematics state exam. Results show that assessors perceived checkbox grading as very useful. However, compared to traditional grading—where assessors follow a correction scheme and communicate the resulting grade—more time is spent on checkbox grading, while both approaches are equally reliable. Blind grading improved inter-rater reliability for some tasks. Overall, checkbox grading might lead to a smoother process where feedback, not solely grades, is communicated to students.</div></div>","PeriodicalId":47539,"journal":{"name":"Studies in Educational Evaluation","volume":"85 ","pages":"Article 101443"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Educational Evaluation","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0191491X24001299","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Assessing exams with multiple assessors is challenging regarding inter-rater reliability and feedback. This paper presents ‘checkbox grading,’ a digital method where exam designers have predefined checkboxes with both feedback and associated partial grades. Assessors then tick the checkboxes relevant to a student solution. Dependencies between checkboxes ensure consistency among assessors in following the grading scheme. Moreover, the approach supports ‘blind grading’ by hiding the grades associated with the checkboxes, thus focusing assessors on the criteria rather than the scores. The approach was studied during a large-scale mathematics state exam. Results show that assessors perceived checkbox grading as very useful. However, compared to traditional grading—where assessors follow a correction scheme and communicate the resulting grade—more time is spent on checkbox grading, while both approaches are equally reliable. Blind grading improved inter-rater reliability for some tasks. Overall, checkbox grading might lead to a smoother process where feedback, not solely grades, is communicated to students.

查看原文本刊更多论文

多审核员的大规模数学考试的复选框评分：对审核员间可靠性、时间投入和使用经验的实地研究

评估多名评估员的考试在评估员之间的可靠性和反馈方面具有挑战性。本文提出了“复选框评分”，这是一种数字方法，其中考试设计者预定义了带有反馈和相关部分分数的复选框。然后，评估员在与学生解决方案相关的复选框中打勾。复选框之间的依赖关系确保评估人员遵循评分方案的一致性。此外，该方法通过隐藏与复选框相关的分数来支持“盲目评分”，从而将评估人员集中在标准而不是分数上。在一次大型数学国家考试中对该方法进行了研究。结果表明，评估人员认为复选框评分非常有用。然而，与传统的评分方式相比，在传统的评分方式中，审核员遵循一个修正方案并传达结果的分数，更多的时间花在复选框评分上，而这两种方法同样可靠。盲评分提高了某些任务的评分者之间的可靠性。总的来说，复选框评分可能会导致一个更顺畅的过程，反馈，而不仅仅是分数，传达给学生。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Studies in Educational Evaluation Multiple-

CiteScore

6.90

自引率

6.50%

发文量

审稿时长

62 days

期刊介绍： Studies in Educational Evaluation publishes original reports of evaluation studies. Four types of articles are published by the journal: (a) Empirical evaluation studies representing evaluation practice in educational systems around the world; (b) Theoretical reflections and empirical studies related to issues involved in the evaluation of educational programs, educational institutions, educational personnel and student assessment; (c) Articles summarizing the state-of-the-art concerning specific topics in evaluation in general or in a particular country or group of countries; (d) Book reviews and brief abstracts of evaluation studies.