分类模型中数据复杂性和团队特征对绩效的影响

IF 0.8 Q4 BUSINESS

International Journal of Business Analytics Pub Date : 2022-01-01 DOI:10.4018/ijban.288517

V. Pungpapong, Prasert Kanawattanachai

{"title":"分类模型中数据复杂性和团队特征对绩效的影响","authors":"V. Pungpapong, Prasert Kanawattanachai","doi":"10.4018/ijban.288517","DOIUrl":null,"url":null,"abstract":"This article investigates the impact of data-complexity and team-specific characteristics on machine learning competition scores. Data from five real-world binary classification competitions hosted on Kaggle.com were analyzed. The data-complexity characteristics were measured in four aspects including standard measures, sparsity measures, class imbalance measures, and feature-based measures. The results showed that the higher the level of the data-complexity characteristics was, the lower the predictive ability of the machine learning model was as well. Our empirical evidence revealed that the imbalance ratio of the target variable was the most important factor and exhibited a nonlinear relationship with the model’s predictive abilities. The imbalance ratio adversely affected the predictive performance when it reached a certain level. However, mixed results were found for the impact of team-specific characteristics measured by team size, team expertise, and the number of submissions on team performance. For high-performing teams, these factors had no impact on team score.","PeriodicalId":42590,"journal":{"name":"International Journal of Business Analytics","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Impact of Data-Complexity and Team Characteristics on Performance in the Classification Model\",\"authors\":\"V. Pungpapong, Prasert Kanawattanachai\",\"doi\":\"10.4018/ijban.288517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article investigates the impact of data-complexity and team-specific characteristics on machine learning competition scores. Data from five real-world binary classification competitions hosted on Kaggle.com were analyzed. The data-complexity characteristics were measured in four aspects including standard measures, sparsity measures, class imbalance measures, and feature-based measures. The results showed that the higher the level of the data-complexity characteristics was, the lower the predictive ability of the machine learning model was as well. Our empirical evidence revealed that the imbalance ratio of the target variable was the most important factor and exhibited a nonlinear relationship with the model’s predictive abilities. The imbalance ratio adversely affected the predictive performance when it reached a certain level. However, mixed results were found for the impact of team-specific characteristics measured by team size, team expertise, and the number of submissions on team performance. For high-performing teams, these factors had no impact on team score.\",\"PeriodicalId\":42590,\"journal\":{\"name\":\"International Journal of Business Analytics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Business Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijban.288517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BUSINESS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Business Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijban.288517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BUSINESS","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了数据复杂性和团队特定特征对机器学习竞赛分数的影响。我们分析了在Kaggle.com上举办的五场真实世界的二元分类比赛的数据。从标准度量、稀疏度量、类不平衡度量和基于特征的度量四个方面度量数据复杂性特征。结果表明，数据复杂度特征水平越高，机器学习模型的预测能力越低。实证结果表明，目标变量的失衡率是最重要的影响因素，且与模型的预测能力呈非线性关系。当失衡比达到一定水平时，会对预测性能产生不利影响。然而，通过团队规模、团队专业知识和提交的数量来衡量团队特定特征对团队绩效的影响，发现了不同的结果。对于高绩效团队，这些因素对团队得分没有影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Impact of Data-Complexity and Team Characteristics on Performance in the Classification Model

This article investigates the impact of data-complexity and team-specific characteristics on machine learning competition scores. Data from five real-world binary classification competitions hosted on Kaggle.com were analyzed. The data-complexity characteristics were measured in four aspects including standard measures, sparsity measures, class imbalance measures, and feature-based measures. The results showed that the higher the level of the data-complexity characteristics was, the lower the predictive ability of the machine learning model was as well. Our empirical evidence revealed that the imbalance ratio of the target variable was the most important factor and exhibited a nonlinear relationship with the model’s predictive abilities. The imbalance ratio adversely affected the predictive performance when it reached a certain level. However, mixed results were found for the impact of team-specific characteristics measured by team size, team expertise, and the number of submissions on team performance. For high-performing teams, these factors had no impact on team score.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Business Analytics BUSINESS-

CiteScore

2.30

自引率

27.30%

发文量

期刊介绍： The main objective of the International Journal of Business Analytics (IJBAN) is to advance the next frontier of decision sciences and provide an international forum for practitioners and researchers in business and governmental organizations—as well as information technology professionals, software developers, and vendors—to exchange, share, and present useful and innovative ideas and work. The journal encourages exploration of different models, methods, processes, and principles in profitable and actionable manners.