Development and internal validation of machine learning prognostic models of sports injuries using self-reported data in athletics (track and field): The influence of quantity and quality of features.

IF 2.5 2区 医学 Q2 SPORT SCIENCES
Spyridon Iatropoulos, Pierre-Eddy Dandrieux, Pascal Edouard, Laurent Navarro
{"title":"Development and internal validation of machine learning prognostic models of sports injuries using self-reported data in athletics (track and field): The influence of quantity and quality of features.","authors":"Spyridon Iatropoulos, Pierre-Eddy Dandrieux, Pascal Edouard, Laurent Navarro","doi":"10.1080/02640414.2025.2517971","DOIUrl":null,"url":null,"abstract":"<p><p>To compare the performance of sports injury prognostic machine learning models when trained on (i) baseline data (i.e. collected once) vs. monitoring data (i.e. collected frequently over a period), (ii) raw monitoring data vs. time-integrating engineered features of the same data, and (iii) different numbers of features. Self-reported data collected during a previous randomised controlled trial in athletics athletes over 39 weeks constituted the dataset for model development. Baseline features, monitoring features, and two time-integrating feature engineering strategies were employed. Seven machine learning algorithms were trained with different groups and numbers of features and validated internally with bootstrapping. The models' discrimination was statistically compared using t-tests or Mann-Whitney tests (α = 0.00026). A dataset of 4537 cases including 149 injuries was derived from 165 athletes. Monitoring features outperformed baseline features in 5 out of 7 algorithms (<i>p</i> < 0.00026). The two feature engineering strategies showed marginal differences (1-8%) in 4 out of 7 algorithms (<i>p</i> < 0.00026). Larger numbers of features showed consistent improvements of performance for 6 out of 7 algorithms. Developing injury prediction ML models based on self-reported data in the sport of athletics seems promising but highly influenced by the quality and quantity of features.</p>","PeriodicalId":17066,"journal":{"name":"Journal of Sports Sciences","volume":" ","pages":"1-15"},"PeriodicalIF":2.5000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Sports Sciences","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/02640414.2025.2517971","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

To compare the performance of sports injury prognostic machine learning models when trained on (i) baseline data (i.e. collected once) vs. monitoring data (i.e. collected frequently over a period), (ii) raw monitoring data vs. time-integrating engineered features of the same data, and (iii) different numbers of features. Self-reported data collected during a previous randomised controlled trial in athletics athletes over 39 weeks constituted the dataset for model development. Baseline features, monitoring features, and two time-integrating feature engineering strategies were employed. Seven machine learning algorithms were trained with different groups and numbers of features and validated internally with bootstrapping. The models' discrimination was statistically compared using t-tests or Mann-Whitney tests (α = 0.00026). A dataset of 4537 cases including 149 injuries was derived from 165 athletes. Monitoring features outperformed baseline features in 5 out of 7 algorithms (p < 0.00026). The two feature engineering strategies showed marginal differences (1-8%) in 4 out of 7 algorithms (p < 0.00026). Larger numbers of features showed consistent improvements of performance for 6 out of 7 algorithms. Developing injury prediction ML models based on self-reported data in the sport of athletics seems promising but highly influenced by the quality and quantity of features.

利用运动员(田径)自我报告数据的运动损伤机器学习预后模型的开发和内部验证:特征数量和质量的影响。
为了比较运动损伤预测机器学习模型在(i)基线数据(即一次收集)与监测数据(即在一段时间内频繁收集)训练时的性能,(ii)原始监测数据与相同数据的时间积分工程特征,以及(iii)不同数量的特征。先前在田径运动员中进行的为期39周的随机对照试验中收集的自我报告数据构成了模型开发的数据集。采用基线特征、监控特征和两种时间积分特征工程策略。用不同的组和数量的特征训练了七种机器学习算法,并在内部通过自举进行了验证。采用t检验或Mann-Whitney检验比较模型的判别性(α = 0.00026)。来自165名运动员的4537例病例包括149例损伤的数据集。在7种算法中,监测特征在5种算法中优于基线特征(p < 0.05)
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Sports Sciences
Journal of Sports Sciences 社会科学-运动科学
CiteScore
6.30
自引率
2.90%
发文量
147
审稿时长
12 months
期刊介绍: The Journal of Sports Sciences has an international reputation for publishing articles of a high standard and is both Medline and Clarivate Analytics-listed. It publishes research on various aspects of the sports and exercise sciences, including anatomy, biochemistry, biomechanics, performance analysis, physiology, psychology, sports medicine and health, as well as coaching and talent identification, kinanthropometry and other interdisciplinary perspectives. The emphasis of the Journal is on the human sciences, broadly defined and applied to sport and exercise. Besides experimental work in human responses to exercise, the subjects covered will include human responses to technologies such as the design of sports equipment and playing facilities, research in training, selection, performance prediction or modification, and stress reduction or manifestation. Manuscripts considered for publication include those dealing with original investigations of exercise, validation of technological innovations in sport or comprehensive reviews of topics relevant to the scientific study of sport.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信