{"title":"Model-based Pricing: Do Not Pay for More than What You Learn!","authors":"Lingjiao Chen, Paraschos Koutris, Arun Kumar","doi":"10.1145/3076246.3076250","DOIUrl":null,"url":null,"abstract":"While a lot of work has focused on improving the efficiency, scalability, and usability of machine learning (ML), little work has studied the cost of data acquisition for ML-based analytics. Datasets are already being bought and sold in marketplaces for various tasks, including ML. But current marketplaces force users to buy such data in whole or as fixed subsets without any awareness of the ML tasks they are used for. This leads to sub-optimal choices and missed opportunities for both data sellers and buyers. In this paper, we outline our vision for a formal and practical pricing framework we call model-based pricing that aims to resolve such issues. Our key observation is that ML users typically need only as much data as needed to meet their accuracy goals, which leads to novel trade-offs between price, accuracy, and runtimes. We explain how this raises interesting new research questions at the intersection of data management, ML, and micro-economics.","PeriodicalId":118931,"journal":{"name":"Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3076246.3076250","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
While a lot of work has focused on improving the efficiency, scalability, and usability of machine learning (ML), little work has studied the cost of data acquisition for ML-based analytics. Datasets are already being bought and sold in marketplaces for various tasks, including ML. But current marketplaces force users to buy such data in whole or as fixed subsets without any awareness of the ML tasks they are used for. This leads to sub-optimal choices and missed opportunities for both data sellers and buyers. In this paper, we outline our vision for a formal and practical pricing framework we call model-based pricing that aims to resolve such issues. Our key observation is that ML users typically need only as much data as needed to meet their accuracy goals, which leads to novel trade-offs between price, accuracy, and runtimes. We explain how this raises interesting new research questions at the intersection of data management, ML, and micro-economics.