Michel de Haan, Stephan van der Zwaard, Jurrit Sanders, Peter J Beek, Richard T Jaspers
{"title":"Beyond Playing Positions: Categorizing Soccer Players Based on Match-Specific Running Performance Using Machine Learning.","authors":"Michel de Haan, Stephan van der Zwaard, Jurrit Sanders, Peter J Beek, Richard T Jaspers","doi":"10.52082/jssm.2025.565","DOIUrl":null,"url":null,"abstract":"<p><p>Soccer players are frequently categorized by playing positions, both in the scientific literature and in practice. However, the utility of this approach in evaluating physical match performance and optimizing physical training programs remains unclear. This study compares the effectiveness of categorizing soccer players by their playing position versus using unsupervised machine learning based on match-specific running performance. Match-specific running data were collected from 40 young elite male soccer players over two seasons. Thirty-one of these players completed a 20-meter sprint test and a maximal incremental treadmill test to measure maximal oxygen uptake. Players were categorized both by playing position and by subgroups derived through <i>k</i>-means clustering based on match-specific running performance. Differences in sprint capacity, endurance capacity, and match-specific running performance were compared between and within playing positions, as well as between and within clusters. The two categorization methods were further compared for variance within subgroups and standardized differences between subgroups for total distance (TD), low-intensity running (LIR), moderate-intensity running (MIR), high-intensity running (HIR), and sprint distance during matches. Match-specific running performance differed between playing positions, despite notable inter-individual differences in running intensities within playing positions. Clustering based on match-specific running performance revealed less variance within groups (TD: <i>P</i> = 0.049, LIR: <i>P</i> = 0.032, HIR: <i>P</i> = 0.033) and larger standardized differences between groups (LIR: <i>P</i> = 0.037, MIR: <i>P</i> = 0.041, HIR: <i>P</i> = 0.035, Sprint: <i>P</i> = 0.018) compared to grouping by playing position. Moreover, 20-meter sprint speed differed between the sprint and high intensity endurance clusters (25.22 vs 23.75 km/h, <i>P</i> = 0.012), but not between playing positions. Using unsupervised machine learning to categorize soccer players improves the identification of player groups with similar match-specific running performance, thereby supporting performance evaluation and contributing to the optimization of physical training.</p>","PeriodicalId":54765,"journal":{"name":"Journal of Sports Science and Medicine","volume":"24 3","pages":"565-577"},"PeriodicalIF":2.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12418189/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sports Science and Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.52082/jssm.2025.565","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Soccer players are frequently categorized by playing positions, both in the scientific literature and in practice. However, the utility of this approach in evaluating physical match performance and optimizing physical training programs remains unclear. This study compares the effectiveness of categorizing soccer players by their playing position versus using unsupervised machine learning based on match-specific running performance. Match-specific running data were collected from 40 young elite male soccer players over two seasons. Thirty-one of these players completed a 20-meter sprint test and a maximal incremental treadmill test to measure maximal oxygen uptake. Players were categorized both by playing position and by subgroups derived through k-means clustering based on match-specific running performance. Differences in sprint capacity, endurance capacity, and match-specific running performance were compared between and within playing positions, as well as between and within clusters. The two categorization methods were further compared for variance within subgroups and standardized differences between subgroups for total distance (TD), low-intensity running (LIR), moderate-intensity running (MIR), high-intensity running (HIR), and sprint distance during matches. Match-specific running performance differed between playing positions, despite notable inter-individual differences in running intensities within playing positions. Clustering based on match-specific running performance revealed less variance within groups (TD: P = 0.049, LIR: P = 0.032, HIR: P = 0.033) and larger standardized differences between groups (LIR: P = 0.037, MIR: P = 0.041, HIR: P = 0.035, Sprint: P = 0.018) compared to grouping by playing position. Moreover, 20-meter sprint speed differed between the sprint and high intensity endurance clusters (25.22 vs 23.75 km/h, P = 0.012), but not between playing positions. Using unsupervised machine learning to categorize soccer players improves the identification of player groups with similar match-specific running performance, thereby supporting performance evaluation and contributing to the optimization of physical training.
无论是在科学文献中还是在实践中,足球运动员都经常按位置进行分类。然而,这种方法在评估体能比赛表现和优化体能训练计划方面的效用尚不清楚。这项研究比较了根据足球运动员的位置对其进行分类的有效性,以及基于特定比赛的跑步表现使用无监督机器学习的有效性。该研究收集了40名年轻优秀男子足球运动员在两个赛季中特定比赛的跑步数据。其中31名运动员完成了20米短跑测试和最大增量跑步机测试,以测量最大摄氧量。根据球员的位置和基于特定比赛运行表现的k-means聚类得出的子组对球员进行分类。短跑能力、耐力能力和特定比赛跑步表现的差异在不同位置之间和不同位置之间以及不同集群之间进行了比较。进一步比较两种分类方法在比赛中总距离(TD)、低强度跑步(LIR)、中强度跑步(MIR)、高强度跑步(HIR)和冲刺距离的亚组内方差和亚组间标准化差异。比赛特定的跑步表现在不同的比赛位置之间存在差异,尽管在不同的比赛位置内跑步强度存在显著的个体差异。基于特定比赛跑步表现的聚类结果显示,与按比赛位置分组相比,组内差异较小(TD: P = 0.049, LIR: P = 0.032, HIR: P = 0.033),组间标准化差异较大(LIR: P = 0.037, MIR: P = 0.041, HIR: P = 0.035, Sprint: P = 0.018)。此外,20米冲刺速度在短跑组和高强度耐力组之间存在差异(25.22 vs 23.75 km/h, P = 0.012),但在打球位置之间没有差异。使用无监督机器学习对足球运动员进行分类,可以提高对具有相似比赛特定跑步表现的球员群体的识别,从而支持表现评估并有助于优化体能训练。
期刊介绍:
The Journal of Sports Science and Medicine (JSSM) is a non-profit making scientific electronic journal, publishing research and review articles, together with case studies, in the fields of sports medicine and the exercise sciences. JSSM is published quarterly in March, June, September and December. JSSM also publishes editorials, a "letter to the editor" section, abstracts from international and national congresses, panel meetings, conferences and symposia, and can function as an open discussion forum on significant issues of current interest.