Machine Learning in Rugby Union: Predicting and Identifying Key Performance Indicators for Professional Rugby Union Players in Match Play Based Workload
Xiangyu Ren, Simon Boisbluche, Kilian Philippe, Mathieu Demy, Sami Äyrämö, Ilkka Rautiainen, Shuzhe Ding, Jacques Prioux
{"title":"Machine Learning in Rugby Union: Predicting and Identifying Key Performance Indicators for Professional Rugby Union Players in Match Play Based Workload","authors":"Xiangyu Ren, Simon Boisbluche, Kilian Philippe, Mathieu Demy, Sami Äyrämö, Ilkka Rautiainen, Shuzhe Ding, Jacques Prioux","doi":"10.1002/ejsc.70042","DOIUrl":null,"url":null,"abstract":"<p>Rugby union is an intermittent high-intensity contact sport requiring the analysis of various training and match metrics. Time-motion analysis and video analysis have enhanced the understanding of the interplay between these two factors. However, limited studies have investigated the effect of workload on key performance indicators (KPIs) during matches. In this study, data collected from the global positioning system (GPS) were used to calculate cumulative workload values over 7, 14, and 21 days prior to each game. After dimensionality reduction through principal component analysis (PCA), these workload values were employed as features, with game KPIs as target variables. Modeling was conducted using linear regression (LR), support vector regression (SVR), random forest regression (RFR), and light gradient boosting machine (LightGBM) for regression tasks. The superiority of the model was assessed by coefficient of determination (<span></span><math></math>), root mean square error (<span></span><math></math>), and correlation coefficient (<span></span><math></math>). The findings revealed that although individual GPS metrics exhibited weak correlations with KPIs, machine learning (ML) models particularly RFR, successfully captured complex interactions and nonlinear relationships. These models achieved significantly improved predictive performance, with <span></span><math></math> values ranging from 0.40 to 0.72 for certain KPIs. Using SHapley Additive exPlanations (SHAP) analysis and partial dependence plots, this study enhanced the interpretability of ML models by identifying the influence of GPS features on KPIs and exploring their underlying mechanisms. These findings offer actionable insights for workload management, emphasizing critical factors that affect player performance.</p>","PeriodicalId":93999,"journal":{"name":"European journal of sport science","volume":"25 9","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ejsc.70042","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European journal of sport science","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ejsc.70042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Rugby union is an intermittent high-intensity contact sport requiring the analysis of various training and match metrics. Time-motion analysis and video analysis have enhanced the understanding of the interplay between these two factors. However, limited studies have investigated the effect of workload on key performance indicators (KPIs) during matches. In this study, data collected from the global positioning system (GPS) were used to calculate cumulative workload values over 7, 14, and 21 days prior to each game. After dimensionality reduction through principal component analysis (PCA), these workload values were employed as features, with game KPIs as target variables. Modeling was conducted using linear regression (LR), support vector regression (SVR), random forest regression (RFR), and light gradient boosting machine (LightGBM) for regression tasks. The superiority of the model was assessed by coefficient of determination (), root mean square error (), and correlation coefficient (). The findings revealed that although individual GPS metrics exhibited weak correlations with KPIs, machine learning (ML) models particularly RFR, successfully captured complex interactions and nonlinear relationships. These models achieved significantly improved predictive performance, with values ranging from 0.40 to 0.72 for certain KPIs. Using SHapley Additive exPlanations (SHAP) analysis and partial dependence plots, this study enhanced the interpretability of ML models by identifying the influence of GPS features on KPIs and exploring their underlying mechanisms. These findings offer actionable insights for workload management, emphasizing critical factors that affect player performance.