利用随机森林预测nba全明星球员

Ghada M. A. Soliman, Ala'a El-Nabawy, A. Misbah, S. Eldawlatly
{"title":"利用随机森林预测nba全明星球员","authors":"Ghada M. A. Soliman, Ala'a El-Nabawy, A. Misbah, S. Eldawlatly","doi":"10.1109/INTELLISYS.2017.8324371","DOIUrl":null,"url":null,"abstract":"National Basketball Association (NBA) All Star Game is a demonstration game played between the selected Western and Eastern conference players. The selection of players for the NBA All Star game purely depends on votes. The fans and coaches vote for the players and decide who is going to make the All Star roster. A player who continues to receive enough votes in following years will play more All Star games. The selection of All Star players in NBA is subjective based on voting and there are no selection criteria that take out the human bias and opinion. Analyzing data from previous sports leagues can provide insight into the factors that lead to winning games and titles. This study aims to classify the players into regular or All Star players from the National Basketball Association and identify the most important characteristics that make a player an All Star player. To accomplish this, the performance per minute of play and per average of total minutes of player were analyzed using Random Forest supported in Apache Spark's scalable machine learning library to identify which variables best predict the regular and All Star players categories. The NBA men basketball dataset is used that is publically available at open source sports in the period 1937 till 2011. This study showed that Random Forest predicts All Star players with an accuracy of 92.5% when studying the performance per average of total minutes of player, whereas an accuracy of 92.48% is obtained for the performance per minute of play. The results identified the features of importance that contribute significantly to scoring and performance index rating of player. In this study, the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology is implemented to address the data mining problem in consistent and professional way. CRISP-DM presents a hierarchical and iterative process model, and provides an extendable framework with generic-to-specific approach, starting from six phases, which are further detailed by generic and then specialized tasks.","PeriodicalId":131825,"journal":{"name":"2017 Intelligent Systems Conference (IntelliSys)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Predicting all star player in the national basketball association using random forest\",\"authors\":\"Ghada M. A. Soliman, Ala'a El-Nabawy, A. Misbah, S. Eldawlatly\",\"doi\":\"10.1109/INTELLISYS.2017.8324371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"National Basketball Association (NBA) All Star Game is a demonstration game played between the selected Western and Eastern conference players. The selection of players for the NBA All Star game purely depends on votes. The fans and coaches vote for the players and decide who is going to make the All Star roster. A player who continues to receive enough votes in following years will play more All Star games. The selection of All Star players in NBA is subjective based on voting and there are no selection criteria that take out the human bias and opinion. Analyzing data from previous sports leagues can provide insight into the factors that lead to winning games and titles. This study aims to classify the players into regular or All Star players from the National Basketball Association and identify the most important characteristics that make a player an All Star player. To accomplish this, the performance per minute of play and per average of total minutes of player were analyzed using Random Forest supported in Apache Spark's scalable machine learning library to identify which variables best predict the regular and All Star players categories. The NBA men basketball dataset is used that is publically available at open source sports in the period 1937 till 2011. This study showed that Random Forest predicts All Star players with an accuracy of 92.5% when studying the performance per average of total minutes of player, whereas an accuracy of 92.48% is obtained for the performance per minute of play. The results identified the features of importance that contribute significantly to scoring and performance index rating of player. In this study, the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology is implemented to address the data mining problem in consistent and professional way. CRISP-DM presents a hierarchical and iterative process model, and provides an extendable framework with generic-to-specific approach, starting from six phases, which are further detailed by generic and then specialized tasks.\",\"PeriodicalId\":131825,\"journal\":{\"name\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"volume\":\"59 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Intelligent Systems Conference (IntelliSys)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INTELLISYS.2017.8324371\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Intelligent Systems Conference (IntelliSys)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INTELLISYS.2017.8324371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

美国职业篮球协会(NBA)全明星赛是西部和东部联盟选定的球员之间的表演赛。NBA全明星赛球员的选择完全取决于投票。球迷和教练投票选出球员,并决定谁将进入全明星名单。如果一名球员在接下来的几年中继续获得足够的选票,那么他将参加更多的全明星比赛。NBA全明星球员的选择是基于投票的主观选择,没有选择标准可以消除人类的偏见和意见。分析以往体育联盟的数据可以让我们深入了解导致比赛获胜和夺冠的因素。本研究旨在将nba球员分为常规球员和全明星球员,并确定使球员成为全明星球员的最重要特征。为了实现这一点,使用Apache Spark的可扩展机器学习库支持的Random Forest来分析球员每分钟的表现和平均总分钟数,以确定哪些变量最能预测常规和全明星球员类别。NBA男子篮球数据集是在1937年至2011年期间在开源体育上公开提供的。本研究表明,Random Forest预测全明星球员的平均分钟表现准确率为92.5%,而预测全明星球员的平均分钟表现准确率为92.48%。结果确定了对球员得分和表现指标评分有重要贡献的重要特征。本研究采用跨行业数据挖掘标准流程(CRISP-DM)方法,以一致和专业的方式解决数据挖掘问题。CRISP-DM提出了一个分层和迭代的过程模型,并提供了一个具有从通用到特定方法的可扩展框架,从六个阶段开始,这些阶段由通用和专用任务进一步详细说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Predicting all star player in the national basketball association using random forest
National Basketball Association (NBA) All Star Game is a demonstration game played between the selected Western and Eastern conference players. The selection of players for the NBA All Star game purely depends on votes. The fans and coaches vote for the players and decide who is going to make the All Star roster. A player who continues to receive enough votes in following years will play more All Star games. The selection of All Star players in NBA is subjective based on voting and there are no selection criteria that take out the human bias and opinion. Analyzing data from previous sports leagues can provide insight into the factors that lead to winning games and titles. This study aims to classify the players into regular or All Star players from the National Basketball Association and identify the most important characteristics that make a player an All Star player. To accomplish this, the performance per minute of play and per average of total minutes of player were analyzed using Random Forest supported in Apache Spark's scalable machine learning library to identify which variables best predict the regular and All Star players categories. The NBA men basketball dataset is used that is publically available at open source sports in the period 1937 till 2011. This study showed that Random Forest predicts All Star players with an accuracy of 92.5% when studying the performance per average of total minutes of player, whereas an accuracy of 92.48% is obtained for the performance per minute of play. The results identified the features of importance that contribute significantly to scoring and performance index rating of player. In this study, the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology is implemented to address the data mining problem in consistent and professional way. CRISP-DM presents a hierarchical and iterative process model, and provides an extendable framework with generic-to-specific approach, starting from six phases, which are further detailed by generic and then specialized tasks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信