Advancing NFL win prediction: from Pythagorean formulas to machine learning algorithms.

IF 2.6 Q2 SPORT SCIENCES

Frontiers in Sports and Active Living Pub Date : 2025-09-12 eCollection Date: 2025-01-01 DOI:10.3389/fspor.2025.1638446

Caroline Weirich, Jun Woo Kim, Youngmin Yoon, Seunghoon Jeong

{"title":"Advancing NFL win prediction: from Pythagorean formulas to machine learning algorithms.","authors":"Caroline Weirich, Jun Woo Kim, Youngmin Yoon, Seunghoon Jeong","doi":"10.3389/fspor.2025.1638446","DOIUrl":null,"url":null,"abstract":"This study evaluates the predictive performance of traditional and machine learning-based models in forecasting NFL team winning percentages over a 21-season dataset (2003-2023). Specifically, we compare the Pythagorean expectation formula-commonly used in sports analytics-with Random Forest regression and a feedforward Neural Network model. Using key performance indicators such as points scored, points allowed, turnovers, rushing and passing efficiency, and penalties, the machine learning models demonstrate superior predictive accuracy. The Neural Network model achieved the highest performance (MAE = 0.052, RMSE = 0.064, R 2 = 0.891), followed by the Random Forest model, both of which significantly outperformed the Pythagorean method. Feature importance analysis using SHAP values identifies points scored and points allowed as the most influential predictors, supplemented by margin of victory, turnovers, and offensive efficiency metrics. These findings underscore the limitations of fixed-formula models and highlight the flexibility and robustness of data-driven approaches. The study offers practical implications for analysts, coaches, and sports management professionals seeking to optimize strategic decisions and competitive performance. Ultimately, the integration of advanced machine learning models provides a powerful tool for enhancing decision-making processes across the NFL landscape.","PeriodicalId":12716,"journal":{"name":"Frontiers in Sports and Active Living","volume":"7 ","pages":"1638446"},"PeriodicalIF":2.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12463883/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Sports and Active Living","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fspor.2025.1638446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"SPORT SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

This study evaluates the predictive performance of traditional and machine learning-based models in forecasting NFL team winning percentages over a 21-season dataset (2003-2023). Specifically, we compare the Pythagorean expectation formula-commonly used in sports analytics-with Random Forest regression and a feedforward Neural Network model. Using key performance indicators such as points scored, points allowed, turnovers, rushing and passing efficiency, and penalties, the machine learning models demonstrate superior predictive accuracy. The Neural Network model achieved the highest performance (MAE = 0.052, RMSE = 0.064, R ² = 0.891), followed by the Random Forest model, both of which significantly outperformed the Pythagorean method. Feature importance analysis using SHAP values identifies points scored and points allowed as the most influential predictors, supplemented by margin of victory, turnovers, and offensive efficiency metrics. These findings underscore the limitations of fixed-formula models and highlight the flexibility and robustness of data-driven approaches. The study offers practical implications for analysts, coaches, and sports management professionals seeking to optimize strategic decisions and competitive performance. Ultimately, the integration of advanced machine learning models provides a powerful tool for enhancing decision-making processes across the NFL landscape.

查看原文本刊更多论文

推进NFL胜利预测：从毕达哥拉斯公式到机器学习算法。

本研究评估了传统模型和基于机器学习的模型在21个赛季数据集（2003-2023）预测NFL球队胜率方面的预测性能。具体来说，我们将毕达哥拉斯期望公式（通常用于体育分析）与随机森林回归和前馈神经网络模型进行了比较。使用得分、失分、失误、冲刺和传球效率以及罚球等关键绩效指标，机器学习模型显示出卓越的预测准确性。神经网络模型的性能最高（MAE = 0.052, RMSE = 0.064, r2 = 0.891），其次是随机森林模型，两者的性能都明显优于毕达哥拉斯方法。使用SHAP值进行特征重要性分析，确定得分和允许得分是最具影响力的预测指标，辅以胜率、失误和进攻效率指标。这些发现强调了固定公式模型的局限性，并强调了数据驱动方法的灵活性和稳健性。该研究为寻求优化战略决策和竞争表现的分析师、教练和体育管理专业人士提供了实际意义。最终，先进的机器学习模型的集成为提高整个NFL的决策过程提供了一个强大的工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊