Proceedings of the Fourth Workshop on Data analytics in the Cloud最新文献

The Vision of BigBench 2.0 BigBench 2.0的愿景

Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562.2799642

T. Rabl, Michael Frank, Manuel Danisch, H. Jacobsen, B. Gowda

引用次数: 14

Speculative Approximations for Terascale Distributed Gradient Descent Optimization 万亿级分布式梯度下降优化的推测近似

Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562.2799563

Chengjie Qin, Florin Rusu

{"title":"Speculative Approximations for Terascale Distributed Gradient Descent Optimization","authors":"Chengjie Qin, Florin Rusu","doi":"10.1145/2799562.2799563","DOIUrl":"https://doi.org/10.1145/2799562.2799563","url":null,"abstract":"Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurations simultaneously and the lack of support to quickly identify sub-optimal configurations are the principal causes. In this paper, we develop two database-inspired techniques for efficient model calibration. Speculative parameter testing applies advanced parallel multi-query processing methods to evaluate several configurations concurrently. Online aggregation is applied to identify sub-optimal configurations early in the processing by incrementally sampling the training dataset and estimating the objective function corresponding to each configuration. We design concurrent online aggregation estimators and define halting conditions to accurately and timely stop the execution. We apply the proposed techniques to distributed gradient descent optimization -- batch and incremental -- for support vector machines and logistic regression models. We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big Data analytics system -- and evaluate their performance over terascalesize synthetic and real datasets. The results confirm that as many as 32 configurations can be evaluated concurrently almost as fast as one, while sub-optimal configurations are detected accurately in as little as a 1/20th fraction of the time.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132035787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Proceedings of the Fourth Workshop on Data analytics in the Cloud 第四届云数据分析研讨会论文集

Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562

Asterios Katsifodimos

引用次数: 0

High-Performance Main-Memory Database Systems and Modern Virtualization: Friends or Foes? 高性能主存数据库系统与现代虚拟化:是敌是友?

Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562.2799643

Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, A. Kemper, Thomas Neumann

{"title":"High-Performance Main-Memory Database Systems and Modern Virtualization: Friends or Foes?","authors":"Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, A. Kemper, Thomas Neumann","doi":"10.1145/2799562.2799643","DOIUrl":"https://doi.org/10.1145/2799562.2799643","url":null,"abstract":"Virtualization owes its popularity mainly to its ability to consolidate software systems from many servers into a single server without sacrificing the desirable isolation between applications. This not only reduces the total cost of ownership, but also enables rapid deployment of complex software and application-agnostic live migration between servers for load balancing, high-availability, and fault-tolerance. However, virtualization is no free lunch. To achieve isolation, virtualization environments need to add an additional layer of abstraction between the bare metal hardware and the application. This inevitably introduces a performance overhead. High-performance main-memory database systems are specifically susceptible to additional software abstractions as they are closely optimized and tuned for the underlying hardware. In this work, we analyze in detail how much overhead modern virtualization options introduce for high-performance main-memory database systems. We evaluate and compare the performance of HyPer and MonetDB under three modern virtualization environments for analytical as well as transactional workloads. Our experiments show that the overhead depends on the system and virtualization environment being used. We further show that main-memory database systems can be efficiently deployed in virtualized cloud environments such as the Google Compute Engine and that \"friendship\" between modern virtualization and main-memory database systems is indeed possible.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116326182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6