Proceedings of the Fourth Workshop on Data analytics in the Cloud最新文献

筛选
英文 中文
The Vision of BigBench 2.0 BigBench 2.0的愿景
Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562.2799642
T. Rabl, Michael Frank, Manuel Danisch, H. Jacobsen, B. Gowda
{"title":"The Vision of BigBench 2.0","authors":"T. Rabl, Michael Frank, Manuel Danisch, H. Jacobsen, B. Gowda","doi":"10.1145/2799562.2799642","DOIUrl":"https://doi.org/10.1145/2799562.2799642","url":null,"abstract":"Data is one of the most important resources for modern enterprises. Better analytics allow for a better understanding of customer requirements and market dynamics. The more data is collected, the more information can be extracted. However, information value extraction is limited by data processing speeds. Due to fast technological advances in big data management there is an abundance of big data systems. This leaves users in the dilemma of choosing a system that features good end-to-end performance for the use case. To get a good understanding of the actual performance of a system, realistic application level workloads are required. To this end, we have developed BigBench, an application level benchmark focused only on big data analytics. In this paper, we present the vision of BigBench 2.0, a suite of benchmarks for all major aspects of big data processing in common business use cases. Unlike other efforts, BigBench 2.0 will have completely consistent and integrated model and workload, which will allow realistic end-to-end benchmarking of big data systems.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"329 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116790529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Speculative Approximations for Terascale Distributed Gradient Descent Optimization 万亿级分布式梯度下降优化的推测近似
Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562.2799563
Chengjie Qin, Florin Rusu
{"title":"Speculative Approximations for Terascale Distributed Gradient Descent Optimization","authors":"Chengjie Qin, Florin Rusu","doi":"10.1145/2799562.2799563","DOIUrl":"https://doi.org/10.1145/2799562.2799563","url":null,"abstract":"Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurations simultaneously and the lack of support to quickly identify sub-optimal configurations are the principal causes. In this paper, we develop two database-inspired techniques for efficient model calibration. Speculative parameter testing applies advanced parallel multi-query processing methods to evaluate several configurations concurrently. Online aggregation is applied to identify sub-optimal configurations early in the processing by incrementally sampling the training dataset and estimating the objective function corresponding to each configuration. We design concurrent online aggregation estimators and define halting conditions to accurately and timely stop the execution. We apply the proposed techniques to distributed gradient descent optimization -- batch and incremental -- for support vector machines and logistic regression models. We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big Data analytics system -- and evaluate their performance over terascalesize synthetic and real datasets. The results confirm that as many as 32 configurations can be evaluated concurrently almost as fast as one, while sub-optimal configurations are detected accurately in as little as a 1/20th fraction of the time.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132035787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Proceedings of the Fourth Workshop on Data analytics in the Cloud 第四届云数据分析研讨会论文集
Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562
Asterios Katsifodimos
{"title":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","authors":"Asterios Katsifodimos","doi":"10.1145/2799562","DOIUrl":"https://doi.org/10.1145/2799562","url":null,"abstract":"Data nowadays comes from various sources including log files, transactional applications, the Web, social media, scientific experiments, and many others. In recent years, various analyses of these data have proven useful to aid companies in engaging and serving their users and defining their corporate strategy, help political candidates win elections, and transform the process of scientific discovery. However, these successes are just the tip of the iceberg: Every day, new, more complex analysis techniques are devised and larger, more varied datasets are accumulated. Tackling the complexity of both the data itself and its analysis remains an open challenge.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126442485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Performance Main-Memory Database Systems and Modern Virtualization: Friends or Foes? 高性能主存数据库系统与现代虚拟化:是敌是友?
Proceedings of the Fourth Workshop on Data analytics in the Cloud Pub Date : 2015-05-31 DOI: 10.1145/2799562.2799643
Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, A. Kemper, Thomas Neumann
{"title":"High-Performance Main-Memory Database Systems and Modern Virtualization: Friends or Foes?","authors":"Tobias Mühlbauer, Wolf Rödiger, Andreas Kipf, A. Kemper, Thomas Neumann","doi":"10.1145/2799562.2799643","DOIUrl":"https://doi.org/10.1145/2799562.2799643","url":null,"abstract":"Virtualization owes its popularity mainly to its ability to consolidate software systems from many servers into a single server without sacrificing the desirable isolation between applications. This not only reduces the total cost of ownership, but also enables rapid deployment of complex software and application-agnostic live migration between servers for load balancing, high-availability, and fault-tolerance. However, virtualization is no free lunch. To achieve isolation, virtualization environments need to add an additional layer of abstraction between the bare metal hardware and the application. This inevitably introduces a performance overhead. High-performance main-memory database systems are specifically susceptible to additional software abstractions as they are closely optimized and tuned for the underlying hardware. In this work, we analyze in detail how much overhead modern virtualization options introduce for high-performance main-memory database systems. We evaluate and compare the performance of HyPer and MonetDB under three modern virtualization environments for analytical as well as transactional workloads. Our experiments show that the overhead depends on the system and virtualization environment being used. We further show that main-memory database systems can be efficiently deployed in virtualized cloud environments such as the Google Compute Engine and that \"friendship\" between modern virtualization and main-memory database systems is indeed possible.","PeriodicalId":106601,"journal":{"name":"Proceedings of the Fourth Workshop on Data analytics in the Cloud","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116326182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信