What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems

B. Schreck, K. Veeramachaneni
{"title":"What Would a Data Scientist Ask? Automatically Formulating and Solving Predictive Problems","authors":"B. Schreck, K. Veeramachaneni","doi":"10.1109/DSAA.2016.55","DOIUrl":null,"url":null,"abstract":"In this paper, we designed a formal language, called Trane, for describing prediction problems over relational datasets, implemented a system that allows data scientists to specify problems in that language. We show that this language is able to describe several prediction problems and even the ones on KAGGLE-a data science competition website. We express 29 different KAGGLE problems in this language. We designed an interpreter, which translates input from the user, specified in this language, into a series of transformation and aggregation operations to apply to a dataset in order to generate labels that can be used to train a supervised machine learning classifier. Using a smaller subset of this language, we developed a system to automatically enumerate, interpret and solve prediction problems. We tested this system on the Walmart Store Sales Forecasting dataset found on KAGGLE, enumerated 1077 prediction problems and built models that attempted to solve them, for which we produced 235 AUC scores. Considering that only one out of those 1077 problems was the focus of a 2.5 month long competition on KAGGLE, we expect this system to deliver a thousandfold increase in data scientist's productivity.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":" 22","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2016.55","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In this paper, we designed a formal language, called Trane, for describing prediction problems over relational datasets, implemented a system that allows data scientists to specify problems in that language. We show that this language is able to describe several prediction problems and even the ones on KAGGLE-a data science competition website. We express 29 different KAGGLE problems in this language. We designed an interpreter, which translates input from the user, specified in this language, into a series of transformation and aggregation operations to apply to a dataset in order to generate labels that can be used to train a supervised machine learning classifier. Using a smaller subset of this language, we developed a system to automatically enumerate, interpret and solve prediction problems. We tested this system on the Walmart Store Sales Forecasting dataset found on KAGGLE, enumerated 1077 prediction problems and built models that attempted to solve them, for which we produced 235 AUC scores. Considering that only one out of those 1077 problems was the focus of a 2.5 month long competition on KAGGLE, we expect this system to deliver a thousandfold increase in data scientist's productivity.
数据科学家会问什么?自动制定和解决预测问题
在本文中,我们设计了一种称为Trane的形式化语言,用于描述关系数据集上的预测问题,实现了一个允许数据科学家用该语言指定问题的系统。我们证明了这种语言能够描述几个预测问题,甚至是数据科学竞赛网站kaggle上的预测问题。我们用这种语言表达了29种不同的KAGGLE问题。我们设计了一个解释器,它将来自用户的输入翻译成一系列转换和聚合操作,以应用于数据集,以生成可用于训练监督机器学习分类器的标签。使用该语言的一个较小子集,我们开发了一个系统来自动枚举、解释和解决预测问题。我们在KAGGLE上找到的沃尔玛商店销售预测数据集上测试了这个系统,列举了1077个预测问题,并建立了试图解决这些问题的模型,为此我们产生了235个AUC分数。考虑到这1077个问题中只有一个是KAGGLE长达2.5个月的竞赛的焦点,我们期望这个系统能够将数据科学家的工作效率提高1000倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信