Machine learning prediction of factors affecting Major League Baseball (MLB) game attendance: algorithm comparisons and macroeconomic factor of unemployment

Juho Park, Junghwan Cho, Alex C. Gang, Hyun-Woo Lee, Paul M. Pedersen
{"title":"Machine learning prediction of factors affecting Major League Baseball (MLB) game attendance: algorithm comparisons and macroeconomic factor of unemployment","authors":"Juho Park, Junghwan Cho, Alex C. Gang, Hyun-Woo Lee, Paul M. Pedersen","doi":"10.1108/ijsms-06-2023-0129","DOIUrl":null,"url":null,"abstract":"PurposeThis study aims to identify an automated machine learning algorithm with high accuracy that sport practitioners can use to identify the specific factors for predicting Major League Baseball (MLB) attendance. Furthermore, by predicting spectators for each league (American League and National League) and division in MLB, the authors will identify the specific factors that increase accuracy, discuss them and provide implications for marketing strategies for academics and practitioners in sport.Design/methodology/approachThis study used six years of daily MLB game data (2014–2019). All data were collected as predictors, such as game performance, weather and unemployment rate. Also, the attendance rate was obtained as an observation variable. The Random Forest, Lasso regression models and XGBoost were used to build the prediction model, and the analysis was conducted using Python 3.7.FindingsThe RMSE value was 0.14, and the R2 was 0.62 as a consequence of fine-tuning the tuning parameters of the XGBoost model, which had the best performance in forecasting the attendance rate. The most influential variables in the model are “Rank” of 0.247 and “Day of the week”, “Home team” and “Day/Night game” were shown as influential variables in order. The result was shown that the “Unemployment rate”, as a macroeconomic factor, has a value of 0.06 and weather factors were a total value of 0.147.Originality/valueThis research highlights unemployment rate as a determinant affecting MLB game attendance rates. Beyond contextual elements such as climate, the findings of this study underscore the significance of economic factors, particularly unemployment rates, necessitating further investigation into these factors to gain a more comprehensive understanding of game attendance.","PeriodicalId":501000,"journal":{"name":"International Journal of Sports Marketing and Sponsorship","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Sports Marketing and Sponsorship","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijsms-06-2023-0129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

PurposeThis study aims to identify an automated machine learning algorithm with high accuracy that sport practitioners can use to identify the specific factors for predicting Major League Baseball (MLB) attendance. Furthermore, by predicting spectators for each league (American League and National League) and division in MLB, the authors will identify the specific factors that increase accuracy, discuss them and provide implications for marketing strategies for academics and practitioners in sport.Design/methodology/approachThis study used six years of daily MLB game data (2014–2019). All data were collected as predictors, such as game performance, weather and unemployment rate. Also, the attendance rate was obtained as an observation variable. The Random Forest, Lasso regression models and XGBoost were used to build the prediction model, and the analysis was conducted using Python 3.7.FindingsThe RMSE value was 0.14, and the R2 was 0.62 as a consequence of fine-tuning the tuning parameters of the XGBoost model, which had the best performance in forecasting the attendance rate. The most influential variables in the model are “Rank” of 0.247 and “Day of the week”, “Home team” and “Day/Night game” were shown as influential variables in order. The result was shown that the “Unemployment rate”, as a macroeconomic factor, has a value of 0.06 and weather factors were a total value of 0.147.Originality/valueThis research highlights unemployment rate as a determinant affecting MLB game attendance rates. Beyond contextual elements such as climate, the findings of this study underscore the significance of economic factors, particularly unemployment rates, necessitating further investigation into these factors to gain a more comprehensive understanding of game attendance.
影响美国职业棒球大联盟(MLB)比赛上座率因素的机器学习预测:算法比较和宏观经济失业因素
目的 本研究旨在找出一种准确率较高的自动机器学习算法,供体育从业人员用来确定预测美国职业棒球大联盟(MLB)观众人数的具体因素。此外,通过预测 MLB 各联盟(美国联盟和国家联盟)和分区的观众人数,作者将确定提高准确性的具体因素,并对其进行讨论,为体育界的学者和从业人员提供营销策略方面的启示。设计/方法/途径本研究使用了六年的 MLB 每日比赛数据(2014-2019 年)。所有数据均作为预测因子收集,如比赛表现、天气和失业率。此外,还获得了上座率作为观测变量。结果由于微调了 XGBoost 模型的调整参数,RMSE 值为 0.14,R2 为 0.62,该模型在预测上座率方面表现最佳。模型中影响最大的变量是 "排名",为 0.247,影响最大的变量依次是 "星期"、"主队 "和 "日/夜场"。结果显示,作为宏观经济因素的 "失业率 "的影响值为 0.06,天气因素的总影响值为 0.147。除了气候等环境因素外,本研究的结果还强调了经济因素(尤其是失业率)的重要性,因此有必要对这些因素进行进一步调查,以便更全面地了解比赛上座率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信