Ryan S. Brill, Ryan Yee, Sameer K. Deshpande, Abraham J. Wyner
{"title":"从机器学习到统计学:美式橄榄球的预期得分案例","authors":"Ryan S. Brill, Ryan Yee, Sameer K. Deshpande, Abraham J. Wyner","doi":"arxiv-2409.04889","DOIUrl":null,"url":null,"abstract":"Expected points is a value function fundamental to player evaluation and\nstrategic in-game decision-making across sports analytics, particularly in\nAmerican football. To estimate expected points, football analysts use machine\nlearning tools, which are not equipped to handle certain challenges. They\nsuffer from selection bias, display counter-intuitive artifacts of overfitting,\ndo not quantify uncertainty in point estimates, and do not account for the\nstrong dependence structure of observational football data. These issues are\nnot unique to American football or even sports analytics; they are general\nproblems analysts encounter across various statistical applications,\nparticularly when using machine learning in lieu of traditional statistical\nmodels. We explore these issues in detail and devise expected points models\nthat account for them. We also introduce a widely applicable novel\nmethodological approach to mitigate overfitting, using a catalytic prior to\nsmooth our machine learning models.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Moving from Machine Learning to Statistics: the case of Expected Points in American football\",\"authors\":\"Ryan S. Brill, Ryan Yee, Sameer K. Deshpande, Abraham J. Wyner\",\"doi\":\"arxiv-2409.04889\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Expected points is a value function fundamental to player evaluation and\\nstrategic in-game decision-making across sports analytics, particularly in\\nAmerican football. To estimate expected points, football analysts use machine\\nlearning tools, which are not equipped to handle certain challenges. They\\nsuffer from selection bias, display counter-intuitive artifacts of overfitting,\\ndo not quantify uncertainty in point estimates, and do not account for the\\nstrong dependence structure of observational football data. These issues are\\nnot unique to American football or even sports analytics; they are general\\nproblems analysts encounter across various statistical applications,\\nparticularly when using machine learning in lieu of traditional statistical\\nmodels. We explore these issues in detail and devise expected points models\\nthat account for them. We also introduce a widely applicable novel\\nmethodological approach to mitigate overfitting, using a catalytic prior to\\nsmooth our machine learning models.\",\"PeriodicalId\":501172,\"journal\":{\"name\":\"arXiv - STAT - Applications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04889\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04889","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Moving from Machine Learning to Statistics: the case of Expected Points in American football
Expected points is a value function fundamental to player evaluation and
strategic in-game decision-making across sports analytics, particularly in
American football. To estimate expected points, football analysts use machine
learning tools, which are not equipped to handle certain challenges. They
suffer from selection bias, display counter-intuitive artifacts of overfitting,
do not quantify uncertainty in point estimates, and do not account for the
strong dependence structure of observational football data. These issues are
not unique to American football or even sports analytics; they are general
problems analysts encounter across various statistical applications,
particularly when using machine learning in lieu of traditional statistical
models. We explore these issues in detail and devise expected points models
that account for them. We also introduce a widely applicable novel
methodological approach to mitigate overfitting, using a catalytic prior to
smooth our machine learning models.