Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond

Hao Liu, Qingyu Guo, Hengshu Zhu, Fuzhen Zhuang, Shen Yang, D. Dou, Hui Xiong
{"title":"Who will Win the Data Science Competition? Insights from KDD Cup 2019 and Beyond","authors":"Hao Liu, Qingyu Guo, Hengshu Zhu, Fuzhen Zhuang, Shen Yang, D. Dou, Hui Xiong","doi":"10.1145/3511896","DOIUrl":null,"url":null,"abstract":"Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team’s competition performance, and investigate the team’s evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams’ future performances. By leveraging team’s rank as a proxy, we observe “the stronger, the stronger” rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team’s future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team’s future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.","PeriodicalId":435653,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data (TKDD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data (TKDD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data science competitions are becoming increasingly popular for enterprises collecting advanced innovative solutions and allowing contestants to sharpen their data science skills. Most existing studies about data science competitions have a focus on improving task-specific data science techniques, such as algorithm design and parameter tuning. However, little effort has been made to understand the data science competition itself. To this end, in this article, we shed light on the team’s competition performance, and investigate the team’s evolving performance in the crowd-sourcing competitive innovation context. Specifically, we first acquire and construct multi-sourced datasets of various data science competitions, including the KDD Cup 2019 machine learning competition and beyond. Then, we conduct an empirical analysis to identify and quantify a rich set of features that are significantly correlated with teams’ future performances. By leveraging team’s rank as a proxy, we observe “the stronger, the stronger” rule; that is, top-ranked teams tend to keep their advantages and dominate weaker teams for the rest of the competition. Our results also confirm that teams with diversified backgrounds tend to achieve better performances. After that, we formulate the team’s future rank prediction problem and propose the Multi-Task Representation Learning (MTRL) framework to model both static features and dynamic features. Extensive experimental results on four real-world data science competitions demonstrate the team’s future performance can be well predicted by using MTRL. Finally, we envision our study will not only help competition organizers to understand the competition in a better way, but also provide strategic implications to contestants, such as guiding the team formation and designing the submission strategy.
谁将赢得数据科学竞赛?来自2019年KDD杯及以后的见解
数据科学竞赛越来越受到企业的欢迎,这些企业收集先进的创新解决方案,并允许参赛者提高他们的数据科学技能。大多数关于数据科学竞赛的现有研究都侧重于改进特定于任务的数据科学技术,例如算法设计和参数调优。然而,很少有人努力去理解数据科学竞赛本身。为此,在本文中,我们阐明了团队的竞争绩效,并研究了团队在众包竞争创新背景下的绩效演变。具体来说,我们首先获取并构建各种数据科学竞赛的多源数据集,包括2019年KDD杯机器学习竞赛等。然后,我们进行了实证分析,以识别和量化与团队未来绩效显著相关的一系列丰富特征。通过利用团队的排名作为代理,我们遵循“越强越强”的规则;也就是说,在剩下的比赛中,排名靠前的球队往往会保持优势,压制实力较弱的球队。我们的研究结果还证实,背景多元化的团队往往会取得更好的绩效。之后,我们制定了团队未来排名预测问题,并提出了多任务表示学习(MTRL)框架来建模静态特征和动态特征。在四个现实世界的数据科学竞赛中进行的大量实验结果表明,使用MTRL可以很好地预测团队的未来表现。最后,我们设想我们的研究不仅可以帮助比赛组织者更好地了解比赛,还可以为参赛者提供战略指导,例如指导团队组建和设计提交策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信