Jordan Rodu, Alexandra F DeJong Lempke, Natalie Kupperman, Jay Hertel
{"title":"在假设-演绎框架中利用体育科学中的机器学习。","authors":"Jordan Rodu, Alexandra F DeJong Lempke, Natalie Kupperman, Jay Hertel","doi":"10.1186/s40798-024-00788-4","DOIUrl":null,"url":null,"abstract":"<p><p>Supervised machine learning (ML) offers an exciting suite of algorithms that could benefit research in sport science. In principle, supervised ML approaches were designed for pure prediction, as opposed to explanation, leading to a rise in powerful, but opaque, algorithms. Recently, two subdomains of ML-explainable ML, which allows us to \"peek into the black box,\" and interpretable ML, which encourages using algorithms that are inherently interpretable-have grown in popularity. The increased transparency of these powerful ML algorithms may provide considerable support for the hypothetico-deductive framework, in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis. However, this paper shows why ML algorithms are fundamentally different from statistical methods, even when using explainable or interpretable approaches. Translating potential insights from supervised ML algorithms, while in many cases seemingly straightforward, can have unanticipated challenges. While supervised ML cannot be used to replace statistical methods, we propose ways in which the sport sciences community can take advantage of supervised ML in the hypothetico-deductive framework. In this manuscript we argue that supervised machine learning can and should augment our exploratory investigations in sport science, but that leveraging potential insights from supervised ML algorithms should be undertaken with caution. We justify our position through a careful examination of supervised machine learning, and provide a useful analogy to help elucidate our findings. Three case studies are provided to demonstrate how supervised machine learning can be integrated into exploratory analysis. Supervised machine learning should be integrated into the scientific workflow with requisite caution. The approaches described in this paper provide ways to safely leverage the strengths of machine learning-like the flexibility ML algorithms can provide for fitting complex patterns-while avoiding potential pitfalls-at best, like wasted effort and money, and at worst, like misguided clinical recommendations-that may arise when trying to integrate findings from ML algorithms into domain knowledge. KEY POINTS: Some supervised machine learning algorithms and statistical models are used to solve the same problem, y = f(x) + ε, but differ fundamentally in motivation and approach. The hypothetico-deductive framework-in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis-is one of the core frameworks comprising the scientific method. In the hypothetico-deductive framework, supervised machine learning can be used in an exploratory capacity. However, it cannot replace the use of statistical methods, even as explainable and interpretable machine learning methods become increasingly popular. Improper use of supervised machine learning in the hypothetico-deductive framework is tantamount to p-value hacking in statistical methods.</p>","PeriodicalId":21788,"journal":{"name":"Sports Medicine - Open","volume":null,"pages":null},"PeriodicalIF":4.1000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564444/pdf/","citationCount":"0","resultStr":"{\"title\":\"On Leveraging Machine Learning in Sport Science in the Hypothetico-deductive Framework.\",\"authors\":\"Jordan Rodu, Alexandra F DeJong Lempke, Natalie Kupperman, Jay Hertel\",\"doi\":\"10.1186/s40798-024-00788-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Supervised machine learning (ML) offers an exciting suite of algorithms that could benefit research in sport science. In principle, supervised ML approaches were designed for pure prediction, as opposed to explanation, leading to a rise in powerful, but opaque, algorithms. Recently, two subdomains of ML-explainable ML, which allows us to \\\"peek into the black box,\\\" and interpretable ML, which encourages using algorithms that are inherently interpretable-have grown in popularity. The increased transparency of these powerful ML algorithms may provide considerable support for the hypothetico-deductive framework, in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis. However, this paper shows why ML algorithms are fundamentally different from statistical methods, even when using explainable or interpretable approaches. Translating potential insights from supervised ML algorithms, while in many cases seemingly straightforward, can have unanticipated challenges. While supervised ML cannot be used to replace statistical methods, we propose ways in which the sport sciences community can take advantage of supervised ML in the hypothetico-deductive framework. In this manuscript we argue that supervised machine learning can and should augment our exploratory investigations in sport science, but that leveraging potential insights from supervised ML algorithms should be undertaken with caution. We justify our position through a careful examination of supervised machine learning, and provide a useful analogy to help elucidate our findings. Three case studies are provided to demonstrate how supervised machine learning can be integrated into exploratory analysis. Supervised machine learning should be integrated into the scientific workflow with requisite caution. The approaches described in this paper provide ways to safely leverage the strengths of machine learning-like the flexibility ML algorithms can provide for fitting complex patterns-while avoiding potential pitfalls-at best, like wasted effort and money, and at worst, like misguided clinical recommendations-that may arise when trying to integrate findings from ML algorithms into domain knowledge. KEY POINTS: Some supervised machine learning algorithms and statistical models are used to solve the same problem, y = f(x) + ε, but differ fundamentally in motivation and approach. The hypothetico-deductive framework-in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis-is one of the core frameworks comprising the scientific method. In the hypothetico-deductive framework, supervised machine learning can be used in an exploratory capacity. However, it cannot replace the use of statistical methods, even as explainable and interpretable machine learning methods become increasingly popular. Improper use of supervised machine learning in the hypothetico-deductive framework is tantamount to p-value hacking in statistical methods.</p>\",\"PeriodicalId\":21788,\"journal\":{\"name\":\"Sports Medicine - Open\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564444/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sports Medicine - Open\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s40798-024-00788-4\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SPORT SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sports Medicine - Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s40798-024-00788-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0
摘要
有监督机器学习(ML)提供了一套令人兴奋的算法,可为体育科学研究带来益处。原则上,有监督的 ML 方法是为纯粹的预测而设计的,而非解释,这导致了功能强大但不透明的算法的兴起。最近,ML 的两个子领域--可解释 ML(允许我们 "窥探黑箱")和可解释 ML(鼓励使用本质上可解释的算法)越来越受欢迎。这些功能强大的 ML 算法透明度的提高可能会为假设-演绎框架提供相当大的支持,在假设-演绎框架中,假设是从先前的信念和理论中产生的,并根据专门为测试该假设而收集的数据进行评估。然而,本文说明了为什么即使使用可解释或可解释的方法,ML 算法与统计方法也有本质区别。从有监督的 ML 算法中转化潜在的洞察力,虽然在很多情况下看似简单明了,但可能会遇到意想不到的挑战。虽然监督式 ML 不能用来取代统计方法,但我们提出了体育科学界在假设-演绎框架中利用监督式 ML 的方法。在本手稿中,我们认为有监督的机器学习可以而且应该增强我们在体育科学领域的探索性研究,但在利用有监督的 ML 算法的潜在洞察力时应谨慎从事。我们通过对有监督机器学习的仔细研究来证明我们的立场,并提供了一个有用的类比来帮助阐明我们的发现。我们还提供了三个案例研究,以展示如何将有监督机器学习整合到探索性分析中。将有监督机器学习整合到科学工作流程中应保持必要的谨慎。本文介绍的方法可以安全地利用机器学习的优势--比如机器学习算法在拟合复杂模式时的灵活性--同时避免潜在的隐患--最好的隐患是浪费精力和金钱,最坏的隐患是将机器学习算法的发现整合到领域知识中时可能产生的误导性临床建议。要点:一些有监督的机器学习算法和统计模型用于解决相同的问题,即 y = f(x) + ε,但在动机和方法上却有本质区别。假设-演绎框架是科学方法的核心框架之一,其中的假设是从先前的信念和理论中产生的,并根据为测试该假设而专门收集的数据进行评估。在假设-演绎框架中,监督机器学习可用于探索。然而,即使可解释和可解释的机器学习方法越来越流行,它也不能取代统计方法的使用。在假设-演绎框架中不恰当地使用监督机器学习,无异于统计方法中的P值黑客行为。
On Leveraging Machine Learning in Sport Science in the Hypothetico-deductive Framework.
Supervised machine learning (ML) offers an exciting suite of algorithms that could benefit research in sport science. In principle, supervised ML approaches were designed for pure prediction, as opposed to explanation, leading to a rise in powerful, but opaque, algorithms. Recently, two subdomains of ML-explainable ML, which allows us to "peek into the black box," and interpretable ML, which encourages using algorithms that are inherently interpretable-have grown in popularity. The increased transparency of these powerful ML algorithms may provide considerable support for the hypothetico-deductive framework, in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis. However, this paper shows why ML algorithms are fundamentally different from statistical methods, even when using explainable or interpretable approaches. Translating potential insights from supervised ML algorithms, while in many cases seemingly straightforward, can have unanticipated challenges. While supervised ML cannot be used to replace statistical methods, we propose ways in which the sport sciences community can take advantage of supervised ML in the hypothetico-deductive framework. In this manuscript we argue that supervised machine learning can and should augment our exploratory investigations in sport science, but that leveraging potential insights from supervised ML algorithms should be undertaken with caution. We justify our position through a careful examination of supervised machine learning, and provide a useful analogy to help elucidate our findings. Three case studies are provided to demonstrate how supervised machine learning can be integrated into exploratory analysis. Supervised machine learning should be integrated into the scientific workflow with requisite caution. The approaches described in this paper provide ways to safely leverage the strengths of machine learning-like the flexibility ML algorithms can provide for fitting complex patterns-while avoiding potential pitfalls-at best, like wasted effort and money, and at worst, like misguided clinical recommendations-that may arise when trying to integrate findings from ML algorithms into domain knowledge. KEY POINTS: Some supervised machine learning algorithms and statistical models are used to solve the same problem, y = f(x) + ε, but differ fundamentally in motivation and approach. The hypothetico-deductive framework-in which hypotheses are generated from prior beliefs and theory, and are assessed against data collected specifically to test that hypothesis-is one of the core frameworks comprising the scientific method. In the hypothetico-deductive framework, supervised machine learning can be used in an exploratory capacity. However, it cannot replace the use of statistical methods, even as explainable and interpretable machine learning methods become increasingly popular. Improper use of supervised machine learning in the hypothetico-deductive framework is tantamount to p-value hacking in statistical methods.