N. Laptev, Wenbo Tao, C. Komurlu, Jason Xu, Deke Sun, T. Lux, Luo Mi
{"title":"Smarter Warehouse","authors":"N. Laptev, Wenbo Tao, C. Komurlu, Jason Xu, Deke Sun, T. Lux, Luo Mi","doi":"10.1109/icdew55742.2022.00005","DOIUrl":null,"url":null,"abstract":"Warehouse users often have to make too many decisions about their queries, pipelines, workflows and data to optimize the resources they use as well as the quality and the availability of their data. For example, whether to use Spark or Presto, how to best partition their data or what hyper-parameters to tune to resolve various query or pipeline problems. Furthermore, warehouse users are often unaware of big performance opportunities around data skew, multi-query optimization, query materialization and more. In this paper we describe the Smarter Warehouse initiative that aims to automate or simplify many of these optimization decisions. Our long term vision is for a large portion of the Smarter Warehouse optimizations to be seamlessly incorporated into the compute and I/O layers of the stack, leading to a simpler warehouse user experience and large amounts of resource savings.","PeriodicalId":429378,"journal":{"name":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icdew55742.2022.00005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Warehouse users often have to make too many decisions about their queries, pipelines, workflows and data to optimize the resources they use as well as the quality and the availability of their data. For example, whether to use Spark or Presto, how to best partition their data or what hyper-parameters to tune to resolve various query or pipeline problems. Furthermore, warehouse users are often unaware of big performance opportunities around data skew, multi-query optimization, query materialization and more. In this paper we describe the Smarter Warehouse initiative that aims to automate or simplify many of these optimization decisions. Our long term vision is for a large portion of the Smarter Warehouse optimizations to be seamlessly incorporated into the compute and I/O layers of the stack, leading to a simpler warehouse user experience and large amounts of resource savings.