Evaluating soccer match prediction models: a deep learning approach and feature optimization for gradient-boosted trees

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Pub Date : 2024-08-23 DOI:10.1007/s10994-024-06608-w

Calvin Yeung, Rory Bunker, Rikuhei Umemoto, Keisuke Fujii

{"title":"Evaluating soccer match prediction models: a deep learning approach and feature optimization for gradient-boosted trees","authors":"Calvin Yeung, Rory Bunker, Rikuhei Umemoto, Keisuke Fujii","doi":"10.1007/s10994-024-06608-w","DOIUrl":null,"url":null,"abstract":"<p>Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent 5 years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction. Our model ranked 16th in the 2023 Soccer Prediction Challenge with RPS 0.2195.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"55 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10994-024-06608-w","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning models have become increasingly popular for predicting the results of soccer matches, however, the lack of publicly-available benchmark datasets has made model evaluation challenging. The 2023 Soccer Prediction Challenge required the prediction of match results first in terms of the exact goals scored by each team, and second, in terms of the probabilities for a win, draw, and loss. The original training set of matches and features, which was provided for the competition, was augmented with additional matches that were played between 4 April and 13 April 2023, representing the period after which the training set ended, but prior to the first matches that were to be predicted (upon which the performance was evaluated). A CatBoost model was employed using pi-ratings as the features, which were initially identified as the optimal choice for calculating the win/draw/loss probabilities. Notably, deep learning models have frequently been disregarded in this particular task. Therefore, in this study, we aimed to assess the performance of a deep learning model and determine the optimal feature set for a gradient-boosted tree model. The model was trained using the most recent 5 years of data, and three training and validation sets were used in a hyperparameter grid search. The results from the validation sets show that our model had strong performance and stability compared to previously published models from the 2017 Soccer Prediction Challenge for win/draw/loss prediction. Our model ranked 16th in the 2023 Soccer Prediction Challenge with RPS 0.2195.

Abstract Image

查看原文本刊更多论文

评估足球比赛预测模型：梯度提升树的深度学习方法和特征优化

机器学习模型在预测足球比赛结果方面越来越受欢迎，然而，由于缺乏公开的基准数据集，模型评估工作面临挑战。2023 年足球预测挑战赛要求预测比赛结果，首先是每支球队的准确进球数，其次是胜、平、负的概率。在为比赛提供的原始比赛和特征训练集的基础上，增加了 2023 年 4 月 4 日至 4 月 13 日期间的额外比赛，这段时间是训练集结束之后，但在要预测的首场比赛之前（根据这些比赛来评估性能）。我们采用了 CatBoost 模型，使用π-rati 作为特征，该特征最初被认为是计算胜/平/负概率的最佳选择。值得注意的是，深度学习模型在这一特定任务中经常被忽视。因此，在本研究中，我们旨在评估深度学习模型的性能，并确定梯度提升树模型的最佳特征集。我们使用最近 5 年的数据对模型进行了训练，并在超参数网格搜索中使用了三个训练集和验证集。验证集的结果显示，与之前发布的 2017 年足球预测挑战赛胜/平/负预测模型相比，我们的模型具有很强的性能和稳定性。我们的模型在 2023 年足球预测挑战赛中以 0.2195 的 RPS 排名第 16 位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine Learning 工程技术-计算机：人工智能

CiteScore

11.00

自引率

2.70%

发文量

162

审稿时长

3 months

期刊介绍： Machine Learning serves as a global platform dedicated to computational approaches in learning. The journal reports substantial findings on diverse learning methods applied to various problems, offering support through empirical studies, theoretical analysis, or connections to psychological phenomena. It demonstrates the application of learning methods to solve significant problems and aims to enhance the conduct of machine learning research with a focus on verifiable and replicable evidence in published papers.