T. Rabl, Michael Frank, Manuel Danisch, H. Jacobsen, B. Gowda
{"title":"The Vision of BigBench 2.0","authors":"T. Rabl, Michael Frank, Manuel Danisch, H. Jacobsen, B. Gowda","doi":"10.1145/2799562.2799642","DOIUrl":"https://doi.org/10.1145/2799562.2799642","url":null,"abstract":"Data is one of the most important resources for modern enterprises. Better analytics allow for a better understanding of customer requirements and market dynamics. The more data is collected, the more information can be extracted. However, information value extraction is limited by data processing speeds. Due to fast technological advances in big data management there is an abundance of big data systems. This leaves users in the dilemma of choosing a system that features good end-to-end performance for the use case. To get a good understanding of the actual performance of a system, realistic application level workloads are required. To this end, we have developed BigBench, an application level benchmark focused only on big data analytics. In this paper, we present the vision of BigBench 2.0, a suite of benchmarks for all major aspects of big data processing in common business use cases. Unlike other efforts, BigBench 2.0 will have completely consistent and integrated model and workload, which will allow realistic end-to-end benchmarking of big data systems.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"329 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116790529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speculative Approximations for Terascale Distributed Gradient Descent Optimization","authors":"Chengjie Qin, Florin Rusu","doi":"10.1145/2799562.2799563","DOIUrl":"https://doi.org/10.1145/2799562.2799563","url":null,"abstract":"Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurations simultaneously and the lack of support to quickly identify sub-optimal configurations are the principal causes. In this paper, we develop two database-inspired techniques for efficient model calibration. Speculative parameter testing applies advanced parallel multi-query processing methods to evaluate several configurations concurrently. Online aggregation is applied to identify sub-optimal configurations early in the processing by incrementally sampling the training dataset and estimating the objective function corresponding to each configuration. We design concurrent online aggregation estimators and define halting conditions to accurately and timely stop the execution. We apply the proposed techniques to distributed gradient descent optimization -- batch and incremental -- for support vector machines and logistic regression models. We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big Data analytics system -- and evaluate their performance over terascalesize synthetic and real datasets. The results confirm that as many as 32 configurations can be evaluated concurrently almost as fast as one, while sub-optimal configurations are detected accurately in as little as a 1/20th fraction of the time.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132035787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","authors":"Asterios Katsifodimos","doi":"10.1145/2799562","DOIUrl":"https://doi.org/10.1145/2799562","url":null,"abstract":"Data nowadays comes from various sources including log files, transactional applications, the Web, social media, scientific experiments, and many others. In recent years, various analyses of these data have proven useful to aid companies in engaging and serving their users and defining their corporate strategy, help political candidates win elections, and transform the process of scientific discovery. However, these successes are just the tip of the iceberg: Every day, new, more complex analysis techniques are devised and larger, more varied datasets are accumulated. Tackling the complexity of both the data itself and its analysis remains an open challenge.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126442485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, A. Kemper, Thomas Neumann
{"title":"High-Performance Main-Memory Database Systems and Modern Virtualization: Friends or Foes?","authors":"Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, A. Kemper, Thomas Neumann","doi":"10.1145/2799562.2799643","DOIUrl":"https://doi.org/10.1145/2799562.2799643","url":null,"abstract":"Virtualization owes its popularity mainly to its ability to consolidate software systems from many servers into a single server without sacrificing the desirable isolation between applications. This not only reduces the total cost of ownership, but also enables rapid deployment of complex software and application-agnostic live migration between servers for load balancing, high-availability, and fault-tolerance. However, virtualization is no free lunch. To achieve isolation, virtualization environments need to add an additional layer of abstraction between the bare metal hardware and the application. This inevitably introduces a performance overhead. High-performance main-memory database systems are specifically susceptible to additional software abstractions as they are closely optimized and tuned for the underlying hardware. In this work, we analyze in detail how much overhead modern virtualization options introduce for high-performance main-memory database systems. We evaluate and compare the performance of HyPer and MonetDB under three modern virtualization environments for analytical as well as transactional workloads. Our experiments show that the overhead depends on the system and virtualization environment being used. We further show that main-memory database systems can be efficiently deployed in virtualized cloud environments such as the Google Compute Engine and that \"friendship\" between modern virtualization and main-memory database systems is indeed possible.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116326182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}