Advancing NFL win prediction: from Pythagorean formulas to machine learning algorithms.

IF 2.6 Q2 SPORT SCIENCES
Frontiers in Sports and Active Living Pub Date : 2025-09-12 eCollection Date: 2025-01-01 DOI:10.3389/fspor.2025.1638446
Caroline Weirich, Jun Woo Kim, Youngmin Yoon, Seunghoon Jeong
{"title":"Advancing NFL win prediction: from Pythagorean formulas to machine learning algorithms.","authors":"Caroline Weirich, Jun Woo Kim, Youngmin Yoon, Seunghoon Jeong","doi":"10.3389/fspor.2025.1638446","DOIUrl":null,"url":null,"abstract":"<p><p>This study evaluates the predictive performance of traditional and machine learning-based models in forecasting NFL team winning percentages over a 21-season dataset (2003-2023). Specifically, we compare the Pythagorean expectation formula-commonly used in sports analytics-with Random Forest regression and a feedforward Neural Network model. Using key performance indicators such as points scored, points allowed, turnovers, rushing and passing efficiency, and penalties, the machine learning models demonstrate superior predictive accuracy. The Neural Network model achieved the highest performance (MAE = 0.052, RMSE = 0.064, <i>R</i> <sup>2</sup> = 0.891), followed by the Random Forest model, both of which significantly outperformed the Pythagorean method. Feature importance analysis using SHAP values identifies points scored and points allowed as the most influential predictors, supplemented by margin of victory, turnovers, and offensive efficiency metrics. These findings underscore the limitations of fixed-formula models and highlight the flexibility and robustness of data-driven approaches. The study offers practical implications for analysts, coaches, and sports management professionals seeking to optimize strategic decisions and competitive performance. Ultimately, the integration of advanced machine learning models provides a powerful tool for enhancing decision-making processes across the NFL landscape.</p>","PeriodicalId":12716,"journal":{"name":"Frontiers in Sports and Active Living","volume":"7 ","pages":"1638446"},"PeriodicalIF":2.6000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12463883/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Sports and Active Living","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fspor.2025.1638446","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

This study evaluates the predictive performance of traditional and machine learning-based models in forecasting NFL team winning percentages over a 21-season dataset (2003-2023). Specifically, we compare the Pythagorean expectation formula-commonly used in sports analytics-with Random Forest regression and a feedforward Neural Network model. Using key performance indicators such as points scored, points allowed, turnovers, rushing and passing efficiency, and penalties, the machine learning models demonstrate superior predictive accuracy. The Neural Network model achieved the highest performance (MAE = 0.052, RMSE = 0.064, R 2 = 0.891), followed by the Random Forest model, both of which significantly outperformed the Pythagorean method. Feature importance analysis using SHAP values identifies points scored and points allowed as the most influential predictors, supplemented by margin of victory, turnovers, and offensive efficiency metrics. These findings underscore the limitations of fixed-formula models and highlight the flexibility and robustness of data-driven approaches. The study offers practical implications for analysts, coaches, and sports management professionals seeking to optimize strategic decisions and competitive performance. Ultimately, the integration of advanced machine learning models provides a powerful tool for enhancing decision-making processes across the NFL landscape.

推进NFL胜利预测:从毕达哥拉斯公式到机器学习算法。
本研究评估了传统模型和基于机器学习的模型在21个赛季数据集(2003-2023)预测NFL球队胜率方面的预测性能。具体来说,我们将毕达哥拉斯期望公式(通常用于体育分析)与随机森林回归和前馈神经网络模型进行了比较。使用得分、失分、失误、冲刺和传球效率以及罚球等关键绩效指标,机器学习模型显示出卓越的预测准确性。神经网络模型的性能最高(MAE = 0.052, RMSE = 0.064, r2 = 0.891),其次是随机森林模型,两者的性能都明显优于毕达哥拉斯方法。使用SHAP值进行特征重要性分析,确定得分和允许得分是最具影响力的预测指标,辅以胜率、失误和进攻效率指标。这些发现强调了固定公式模型的局限性,并强调了数据驱动方法的灵活性和稳健性。该研究为寻求优化战略决策和竞争表现的分析师、教练和体育管理专业人士提供了实际意义。最终,先进的机器学习模型的集成为提高整个NFL的决策过程提供了一个强大的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.60
自引率
7.40%
发文量
459
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信