{"title":"Using Scouting Reports Text To Predict NCAA → NBA Performance","authors":"Philip Z. Maymin","doi":"10.1080/2573234X.2021.1873077","DOIUrl":null,"url":null,"abstract":"ABSTRACT Draft decisions by National Basketball Association (NBA) teams are notoriously poor. Analytics can help but are often dismissed for being too overfit, complex, risky, and incomplete. To address these concerns, we train separate leave-one-out random forests machine learning models for each collegiate NBA prospect from 2006 through 2019 with a conservative utility function on a novel comprehensive dataset including the raw text of scouting reports, combine measurements, on-court stats, mock draft placements, and more. Despite being unable to draft high school or international players, the resulting model outperforms the actual decisions of all but one NBA team, with an average gain of $100 million. Target shuffling shows that the model does not overfit and feature shuffling shows that handedness and ESPN mock draft rating, but not other mock drafts, are most important. NBA teams may be missing value by not following a disciplined, model-driven, prescriptive analytics approach to decision making.","PeriodicalId":36417,"journal":{"name":"Journal of Business Analytics","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2021-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/2573234X.2021.1873077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
ABSTRACT Draft decisions by National Basketball Association (NBA) teams are notoriously poor. Analytics can help but are often dismissed for being too overfit, complex, risky, and incomplete. To address these concerns, we train separate leave-one-out random forests machine learning models for each collegiate NBA prospect from 2006 through 2019 with a conservative utility function on a novel comprehensive dataset including the raw text of scouting reports, combine measurements, on-court stats, mock draft placements, and more. Despite being unable to draft high school or international players, the resulting model outperforms the actual decisions of all but one NBA team, with an average gain of $100 million. Target shuffling shows that the model does not overfit and feature shuffling shows that handedness and ESPN mock draft rating, but not other mock drafts, are most important. NBA teams may be missing value by not following a disciplined, model-driven, prescriptive analytics approach to decision making.