{"title":"The Impact of Data-Complexity and Team Characteristics on Performance in the Classification Model","authors":"V. Pungpapong, Prasert Kanawattanachai","doi":"10.4018/ijban.288517","DOIUrl":null,"url":null,"abstract":"This article investigates the impact of data-complexity and team-specific characteristics on machine learning competition scores. Data from five real-world binary classification competitions hosted on Kaggle.com were analyzed. The data-complexity characteristics were measured in four aspects including standard measures, sparsity measures, class imbalance measures, and feature-based measures. The results showed that the higher the level of the data-complexity characteristics was, the lower the predictive ability of the machine learning model was as well. Our empirical evidence revealed that the imbalance ratio of the target variable was the most important factor and exhibited a nonlinear relationship with the model’s predictive abilities. The imbalance ratio adversely affected the predictive performance when it reached a certain level. However, mixed results were found for the impact of team-specific characteristics measured by team size, team expertise, and the number of submissions on team performance. For high-performing teams, these factors had no impact on team score.","PeriodicalId":42590,"journal":{"name":"International Journal of Business Analytics","volume":" ","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijban.288517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BUSINESS","Score":null,"Total":0}
引用次数: 0
Abstract
This article investigates the impact of data-complexity and team-specific characteristics on machine learning competition scores. Data from five real-world binary classification competitions hosted on Kaggle.com were analyzed. The data-complexity characteristics were measured in four aspects including standard measures, sparsity measures, class imbalance measures, and feature-based measures. The results showed that the higher the level of the data-complexity characteristics was, the lower the predictive ability of the machine learning model was as well. Our empirical evidence revealed that the imbalance ratio of the target variable was the most important factor and exhibited a nonlinear relationship with the model’s predictive abilities. The imbalance ratio adversely affected the predictive performance when it reached a certain level. However, mixed results were found for the impact of team-specific characteristics measured by team size, team expertise, and the number of submissions on team performance. For high-performing teams, these factors had no impact on team score.
期刊介绍:
The main objective of the International Journal of Business Analytics (IJBAN) is to advance the next frontier of decision sciences and provide an international forum for practitioners and researchers in business and governmental organizations—as well as information technology professionals, software developers, and vendors—to exchange, share, and present useful and innovative ideas and work. The journal encourages exploration of different models, methods, processes, and principles in profitable and actionable manners.