{"title":"Population-adjusted national rankings in the Olympics","authors":"Robert C. Duncan, Andrew Parece","doi":"10.3233/jsa-240874","DOIUrl":"https://doi.org/10.3233/jsa-240874","url":null,"abstract":"Ranking countries in the Olympic Games by medal counts clearly favors large-population countries over small ones, while ranking by medals-per-capita produces national rankings with very small population countries on top. We discuss why this happens, and propose a new national ranking system for the Olympics, also based upon medals won, which is inclusive in the sense that countries of widely-varying population can achieve high rankings. This population-adjusted probability ranking ranks countries by how much evidence they show for high capability at Olympic sports. In particular, it ranks countries according to how improbable their medal counts would be in an idealized reference model of the Games which posits that all medal-winning nations have equal propensity per capita for winning medals. The ranking index U is defined using a simple binomial sum. Here we explain the method, and we present population-adjusted national rankings for the last three summer Olympics (London 2012, Rio 2016 and Tokyo 2020, held in 2021). If the advantages of this ranking method come to be understood by sports media covering the Olympics and by the interested public, it could be widely reported alongside raw medal counts, thus adding excitement and interest to the Olympics.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":0.6,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141654384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sabermetrics by the sea: Evaluating college players with the Cape Cod Baseball League","authors":"Humbert Kilanowski, Thomas Moloney","doi":"10.3233/jsa-240771","DOIUrl":"https://doi.org/10.3233/jsa-240771","url":null,"abstract":"From the dawn of the “Moneyball” system of searching for players with undervalued skills, an increasing proportion of players chosen in the Major League draft has come from the collegiate ranks, and while every professional team has an analytics department, the draft remains the last frontier for identifying and acquiring the best prospective players. Thus, it has become more important in recent years to evaluate college players properly, and while players’ statistics during the college season can vary wildly due to differing levels of competition, it is necessary to find a more objective metric for measuring college players’ skills. We propose that the most effective metric for doing so comes from observing players’ performances during the summer, when the variable of strength of schedule can be directly controlled, as players of the same skill level compete against each other. Our study focuses on the Cape Cod Baseball League (CCBL), a prestigious summer league that attracts the most talented college players, from which many players are drafted into the Majors every year. Our reasons for choosing the CCBL are the aforementioned homogeneity of talent; the lack of effects of travel fatigue, as the teams all play in a concentrated geographical area; and the league’s built-in replacement level, as temporary players often fill roster spots for players who had been selected the previous autumn, but whose college teams have advanced to the College World Series or who play on a national team during part of the CCBL season. This replacement level is used to calculate a metric of Wins Above Replacement, which we call cWAR.1","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140672830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aritra Majumdar, Rashid Bakirov, Dan Hodges, Sean McCullagh, Tim Rees
{"title":"A multi-season machine learning approach to examine the training load and injury relationship in professional soccer","authors":"Aritra Majumdar, Rashid Bakirov, Dan Hodges, Sean McCullagh, Tim Rees","doi":"10.3233/jsa-240718","DOIUrl":"https://doi.org/10.3233/jsa-240718","url":null,"abstract":"OBJECTIVES: The purpose of this study was to use machine learning to examine the relationship between training load and soccer injury with a multi-season dataset from one English Premier League club. METHODS: Participants were 35 male professional soccer players (aged 25.79±3.75 years, range 18–37 years; height 1.80±0.07 m, range 1.63–1.95 m; weight 80.70±6.78 kg, range 66.03–93.70 kg), with data collected from the 2014–2015 season until the 2018–2019 season. A total of 106 training loads variables (40 GPS data, 6 personal information, 14 physical data, 4 psychological data and 14 ACWR, 14 MSWR and 14 EWMA data) were examined in relation to 133 non-contact injuries, with a high imbalance ratio of 0.013. RESULTS: XGBoost and Artificial Neural Network were implemented to train the machine learning models using four and a half seasons’ data, with the developed models subsequently tested on the following half season’s data. During the first four and a half seasons, there were 341 injuries; during the next half season there were 37 injuries. To interpret and visualize the output of each model and the contribution of each feature (i.e., training load) towards the model, we used the Shapley Additive Explanations (SHAP) approach. Of 37 injuries, XGBoost correctly predicted 26 injuries, with recall and precision of 73% and 10% respectively. Artificial Neural Network correctly predicted 28 injuries, with recall and precision of 77% and 13% respectively. In the model using Artificial Neural Network (the relatively more accurate model), last injury area and weight appeared to be the most important features contributing to the prediction of injury. CONCLUSIONS: This was the first study of its kind to use Artificial Neural Network and a multi-season dataset for injury prediction. Our results demonstrate the potential to predict injuries with high recall, thereby identifying most of the injury cases, albeit, due to high class imbalance, precision suffered. This approach to using machine learning provides potentially valuable insights for soccer organizations and practitioners when monitoring load injuries.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140673244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paweł Krawczyk, Mateusz Szczerba, Jan Labiński, Maksymilian Smoliński
{"title":"Performance analysis in top handball matches in the seasons before, during, and after the COVID-19 pandemic","authors":"Paweł Krawczyk, Mateusz Szczerba, Jan Labiński, Maksymilian Smoliński","doi":"10.3233/jsa-240769","DOIUrl":"https://doi.org/10.3233/jsa-240769","url":null,"abstract":"The aim of the study was to determine whether there are differences in performance analysis in handball between Pre-COVID-19, during COVID-19, and Post-COVID-19 seasons. The study material was obtained from the official match statistics of PGNiG Super league Ltd. Matches were played in the 2019/2020 season before COVID-19, 2020/2021 during COVID-19, and 2021/2022 Post-COVID-19. The Mann-Whitney U test was used for comparisons between two groups, for three groups using the Kruskal-Wallis test. In Pre-COVID-19 season, players made an average of 1.3 more 9 meter throws the Post-COVID-19. Post-COVID-19 season is characterized by a higher 6 meter goals and 6 meter throw count with respect to the Pre-COVID-19. The results show a higher goalkeeper 7 meter throw effectiveness in Pre-COVID-19 season than in COVID-19. The increasing number of throws and goals from the 6th meter along with a decrease in the number of throws from the 9th meter indicates the latest trends in handball. A reduction in the number of offensive fouls and an increase in the number of fast attacks and the effectiveness of goalkeepers’ interventions from 7 meters in the second round of the COVID-19 season indicates the adaptation of players to the new conditions created by the pandemic.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140268477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling and prediction of tennis matches at Grand Slam tournaments","authors":"N. Buhamra, A. Groll, S. Brunner","doi":"10.3233/jsa-240670","DOIUrl":"https://doi.org/10.3233/jsa-240670","url":null,"abstract":"In this manuscript, different approaches for modeling and prediction of tennis matches in Grand Slam tournaments are proposed. The data used here contain information on 5,013 matches in men’s Grand Slam tournaments from the years 2011–2022. All regarded approaches are based on regression models, modeling the probability of the first-named player winning. Several potential covariates are considered including the players’ age, the ATP ranking and points, odds, elo rating as well as two additional age variables, which take into account that the optimal age of a tennis player is between 28 and 32 years. We compare the different regression model approaches with respect to three performance measures, namely classification rate, predictive Bernoulli likelihood, and Brier score in a 43-fold cross-validation-type approach for the matches of the years 2011 to 2021. The top five optimal models with highest average ranks are then selected. In order to predict and compare the results of the tournaments in 2022 with the actual results, a comparison over a continuously updating data set via a “rolling window” strategy is used. Also, again the previously mentioned performance measures are calculated. Additionally, we examine whether the assumption of non-linear effects or additional court- and player-specific abilities is reasonable.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140408745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxime Settembre, Martin Buchheit, K. Hader, Ray Hamill, Adrien Tarascon, Raymond Verheijen, Derek McHugh
{"title":"Factors associated with match outcomes in elite European football – insights from machine learning models","authors":"Maxime Settembre, Martin Buchheit, K. Hader, Ray Hamill, Adrien Tarascon, Raymond Verheijen, Derek McHugh","doi":"10.3233/jsa-240745","DOIUrl":"https://doi.org/10.3233/jsa-240745","url":null,"abstract":"AIM To examine the factors affecting European Football match outcomes using machine learning models. METHODS Fixtures of 269 teams competing in the top seven European leagues were extracted (2001/02 to 2021/22, total >61,000 fixtures). We used eXtreme Gradient Boosting (XGBoost) to assess the relationship between result (win, draw, loss) and the explanatory variables. RESULTS The top contributors to match outcomes were travel distance, between-team differences in Elo (with a contribution magnitude to the model half of that of travel distance and match location), and recent domestic performance (with a contribution magnitude of a fourth to a third of that of travel distance and match location), irrespective of the dataset and context analyzed. Contextual factors such as rest days between matches, the number of matches since the managers have been in charge, and match-to-match player rotations were also shown to influence match outcomes; however, their contribution magnitude was consistently 4–8 times smaller than that of the three main contributors mentioned above. CONCLUSIONS Machine learning has proven to provide insightful results for coaches and supporting staff who may use their results to set expectations and adjust their practices in relation to the different contexts examined here.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140425891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bikash Deb, Javier Fernández Navarro, A. McRobert, Ian Jarman
{"title":"Finding repeatable progressive pass clusters and application in international football","authors":"Bikash Deb, Javier Fernández Navarro, A. McRobert, Ian Jarman","doi":"10.3233/jsa-220732","DOIUrl":"https://doi.org/10.3233/jsa-220732","url":null,"abstract":"Progressive passing in football (soccer) is a key aspect in creating positive possession outcomes. Whilst this is well established, there is not a consistent way to describe the different types of progressive passes. We expand on the previous literature, providing a complete methodological approach to progressive pass clustering from selection of the number of clusters (k) to risk-reward profiling of these progressive pass types. In this paper the Separation and Concordance (SeCo) framework is utilised to provide a process to analyse k-means clustering solutions in a more repeatable way. The results demonstrate that we can find stable progressive pass clusters in International Football and their efficacy with progressive passes “Mid Central to Mid Half Space” in build-up and “Mid Half Space to Final Central” into the final 3rd having the best balance between risk (turnover) and reward (shot created) in the subsequent possession. This allowed for opposition profiling of player and team patterns in different phases of play, with a case study presented for the teams in the Last 16 of the 2022 World Cup.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139608008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Winner prediction in an ongoing one day international cricket match","authors":"Yash Agrawal, Kundan Kandhway","doi":"10.3233/jsa-220735","DOIUrl":"https://doi.org/10.3233/jsa-220735","url":null,"abstract":"Cricket is a team sport with an intricate set of rules, where players specialize in multiple skills such as batting, bowling, and fielding. Playing conditions and home advantage also impact the game. Thus, it is quite challenging to build an accurate quantitative model for the game. In this paper, we provide a data driven approach to predict the winner of a cricket match. We divide the ongoing match into various states and provide a prediction for each state using supervised machine learning models. We employ dynamic features that account for the current match situation, together with the static features like team strength, winner of the toss, and the home advantage. We also use SHAP scores—an explainable AI technique—to interpret the proposed prediction model. We use ball-by-ball data from 1359 men’s one day international cricket matches played between January 2004 to January 2022 to present our results. We achieved the best in-play prediction accuracy of about 85% . SHAP scores reveal that during initial phases of the match, the model treats static features like team strength more important than others, in making the predictions. But as the match progresses, dynamic features capturing the current match situation become exceedingly important. Our work may be useful in preparing tools for in-play winner prediction for live cricket matches that can be used in websites and mobile applications covering the sport, in providing analytics during live television commentary, and in legal betting platforms.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139535056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantitative analysis of professional basketball: A qualitative discussion","authors":"Yukun Zhou, Tianyi Li","doi":"10.3233/jsa-220713","DOIUrl":"https://doi.org/10.3233/jsa-220713","url":null,"abstract":"Quantitative analysis of professional basketball become an attractive field for experienced data analysts, and the recent availability of high-resolution datasets pushes data-driven basketball analytics to a higher degree. We present a qualitative discussion on quantitative professional basketball. We propose and discuss the dimensions, the levels of granularity, and the types of tasks in quantitative basketball. We review key literature in the past two decades and map them into the proposed qualitative framework, with an evolutionary perspective and an emphasis on recent advances. A list of questions around professional basketball that could be approached with quantitative tools is displayed, pointing to directions for future research. We touch on the new landscapes of virtual basketball at enriching the space for quantitative analysis. This report serves as a qualitative primer for quantitative analysis of professional basketball, exhibiting the growing prospect of the promising research area.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139535760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A goal-aligned coordinate system for invasion games","authors":"Ulrik Brandes","doi":"10.3233/jsa-220706","DOIUrl":"https://doi.org/10.3233/jsa-220706","url":null,"abstract":"Spatial locations of players and game devices are a fundamental data type in team-sports analytics. They are typically specified in Cartesian coordinates, but with varying conventions for the origin, orientation, and scaling. In invasion games such as football, basketball, or hockey, however, many markings are of fixed dimension even when the field of play is not, so that the game-specific meaning of locations does not scale uniformly. We propose an alternative coordinate system that accommodates variable field sizes by using the goals instead of a corner or the center of the field of play as frames of reference.","PeriodicalId":53203,"journal":{"name":"Journal of Sports Analytics","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139203485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}