Keeping Elo alive: Evaluating and improving measurement properties of learning systems based on Elo ratings.

IF 1.5 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
Maria Bolsinova, Bence Gergely, Matthieu J S Brinkhuis
{"title":"Keeping Elo alive: Evaluating and improving measurement properties of learning systems based on Elo ratings.","authors":"Maria Bolsinova, Bence Gergely, Matthieu J S Brinkhuis","doi":"10.1111/bmsp.12395","DOIUrl":null,"url":null,"abstract":"<p><p>The Elo Rating System which originates from competitive chess has been widely utilised in large-scale online educational applications where it is used for on-the-fly estimation of ability, item calibration, and adaptivity. In this paper, we aim to critically analyse the shortcomings of the Elo rating system in an educational context, shedding light on its measurement properties and when these may fall short in accurately capturing student abilities and item difficulties. In a simulation study, we look at the asymptotic properties of the Elo rating system. Our results show that the Elo ratings are generally not unbiased and their variances are context-dependent. Furthermore, in scenarios where items are selected adaptively based on the current ratings and the item difficulties are updated alongside the student abilities, the variance of the ratings across items and students artificially increases over time and as a result the ratings do not converge. We propose a solution to this problem which entails using two parallel chains of ratings which remove the dependence of item selection on the current errors in the ratings.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1111/bmsp.12395","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The Elo Rating System which originates from competitive chess has been widely utilised in large-scale online educational applications where it is used for on-the-fly estimation of ability, item calibration, and adaptivity. In this paper, we aim to critically analyse the shortcomings of the Elo rating system in an educational context, shedding light on its measurement properties and when these may fall short in accurately capturing student abilities and item difficulties. In a simulation study, we look at the asymptotic properties of the Elo rating system. Our results show that the Elo ratings are generally not unbiased and their variances are context-dependent. Furthermore, in scenarios where items are selected adaptively based on the current ratings and the item difficulties are updated alongside the student abilities, the variance of the ratings across items and students artificially increases over time and as a result the ratings do not converge. We propose a solution to this problem which entails using two parallel chains of ratings which remove the dependence of item selection on the current errors in the ratings.

保持Elo的活力:评估和改进基于Elo评级的学习系统的测量特性。
源自国际象棋的Elo评分系统已被广泛应用于大规模的在线教育应用中,用于能力的实时评估、项目校准和适应性。在本文中,我们的目标是批判性地分析Elo评分系统在教育背景下的缺点,揭示其测量特性,以及这些特性在准确捕捉学生能力和项目困难方面可能存在的不足。在模拟研究中,我们研究了Elo评级系统的渐近性质。我们的研究结果表明,Elo评级通常不是无偏的,它们的差异是上下文相关的。此外,在根据当前评分自适应地选择项目,并且项目难度与学生能力一起更新的情况下,项目和学生之间的评分差异会随着时间的推移而人为地增加,因此评分不会收敛。我们提出了一个解决这个问题的方法,它需要使用两个平行的评级链,从而消除了项目选择对评级中当前错误的依赖。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.00
自引率
3.80%
发文量
34
审稿时长
>12 weeks
期刊介绍: The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信