Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond最新文献

筛选
英文 中文
Cross-system NoSQL data transformations with NotaQL 使用NotaQL进行跨系统NoSQL数据转换
Johannes Schildgen, Thomas Lottermann, S. Deßloch
{"title":"Cross-system NoSQL data transformations with NotaQL","authors":"Johannes Schildgen, Thomas Lottermann, S. Deßloch","doi":"10.1145/2926534.2926535","DOIUrl":"https://doi.org/10.1145/2926534.2926535","url":null,"abstract":"The rising adoption of NoSQL technology in enterprises causes a heterogeneous landscape of different data stores. Different stores provide distinct advantages and disadvantages, making it necessary for enterprises to facilitate multiple systems for specific purposes. This resulting polyglot persistence is difficult to handle for developers since some data needs to be replicated and aggregated between different and within the same stores. Currently, there are no uniform tools to perform these data transformations since all stores feature different APIs and data models. In this paper, we present the transformation language NotaQL that allows cross-system data transformations. These transformations are output-oriented, meaning that the structure of a transformation script is similar to that of the output. Besides, we provide an aggregation-centric approach, which makes aggregation operations as easy as possible.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132641970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Deterministic load balancing for parallel joins 并行连接的确定性负载平衡
Paraschos Koutris, Nivetha Singara Vadivelu
{"title":"Deterministic load balancing for parallel joins","authors":"Paraschos Koutris, Nivetha Singara Vadivelu","doi":"10.1145/2926534.2926536","DOIUrl":"https://doi.org/10.1145/2926534.2926536","url":null,"abstract":"We study the problem of distributing the tuples of a relation to a number of processors organized in an r-dimensional hypercube, which is an important task for parallel join processing. In contrast to previous work, which proposed randomized algorithms for the task, we ask here the question of how to construct efficient deterministic distribution strategies that can optimally load balance the input relation. We first present some general lower bounds on the load for any dimension; these bounds depend not only on the size of the relation, but also on the maximum frequency of each value in the relation. We then construct an algorithm for the case of 1 dimension that is optimal within a constant factor, and an algorithm for the case of 2 dimensions that is optimal within a polylogarithmic factor. Our 2-dimensional algorithm is based on an interesting connection with the vector load balancing problem, a well-studied problem that generalizes classic load balancing.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127910723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bridging the gap: towards optimization across linear and relational algebra 弥合差距:朝着优化跨越线性和关系代数
Andreas Kunft, Alexander B. Alexandrov, Asterios Katsifodimos, V. Markl
{"title":"Bridging the gap: towards optimization across linear and relational algebra","authors":"Andreas Kunft, Alexander B. Alexandrov, Asterios Katsifodimos, V. Markl","doi":"10.1145/2926534.2926540","DOIUrl":"https://doi.org/10.1145/2926534.2926540","url":null,"abstract":"Advanced data analysis typically requires some form of pre-processing in order to extract and transform data before processing it with machine learning and statistical analysis techniques. Pre-processing pipelines are naturally expressed in dataflow APIs (e.g., MapReduce, Flink, etc.), while machine learning is expressed in linear algebra with iterations. Programmers therefore perform end-to-end data analysis utilizing multiple programming paradigms and systems. This impedance mismatch not only hinders productivity but also prevents optimization opportunities, such as sharing of physical data layouts (e.g., partitioning) and data structures among different parts of a data analysis program. The goal of this work is twofold. First, it aims to alleviate the impedance mismatch by allowing programmers to author complete end-to-end programs in one engine-independent language that is automatically parallelized. Second, it aims to enable joint optimizations over both relational and linear algebra. To achieve this goal, we present the design of Lara, a deeply embedded language in Scala which enables authoring scalable programs using two abstract data types (DataBag and Matrix) and control flow constructs. Programs written in Lara are compiled to an intermediate representation (IR) which enables optimizations across linear and relational algebra. The IR is finally used to compile code for different execution engines.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127134090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
On exploring efficient shuffle design for in-memory MapReduce 基于内存MapReduce的高效shuffle设计研究
Harunobu Daikoku, H. Kawashima, O. Tatebe
{"title":"On exploring efficient shuffle design for in-memory MapReduce","authors":"Harunobu Daikoku, H. Kawashima, O. Tatebe","doi":"10.1145/2926534.2926538","DOIUrl":"https://doi.org/10.1145/2926534.2926538","url":null,"abstract":"MapReduce is commonly used as a way of big data analysis in many fields. Shuffling, the inter-node data exchange phase of MapReduce, has been reported as the major bottleneck of the framework. Acceleration of shuffling has been studied in literature, and we raise two questions in this paper. The first question pertains to the effect of Remote Direct Memory Access (RDMA) on the performance of shuffling. RDMA enables one machine to read and write data on the local memory of another and has been known to be an efficient data transfer mechanism. Does the pure use of RDMA affect the performance of shuffling? The second question is the data transfer algorithm to use. There are two types of shuffling algorithms for the conventional MapReduce implementations: Fully-Connected and more sophisticated algorithms such as Pairwise. Does the data transfer algorithm affect the performance of shuffling? To answer these questions, we designed and implemented yet another MapReduce system from scratch in C/C++ to gain the maximum performance and to reserve design flexibility. For the first question, we compared RDMA shuffling based on rsocket with the one based on IPoIB. The results of experiments with GroupBy showed that RDMA accelerates map+shuffle phase by around 50%. For the second question, we first compared our in-memory system with Apache Spark to investigate whether our system performed more efficiently than the existing system. Our system demonstrated performance improvement by a factor of 3.04 on Word Count, and by a factor of 2.64 on BiGram Count as compared to Spark. Then, we compared the two data exchange algorithms, Fully-Connected and Pairwise. The results of experiments with BiGram Count showed that Fully-Connected without RDMA was 13% more efficient than Pairwise with RDMA. We conclude that it is necessary to overlap map and shuffle phases to gain performance improvement. The reason of the relatively small percentage of improvement can be attributed to the time-consuming insertions of key-value pairs into the hash-map in the map phase.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124850305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Faucet: a user-level, modular technique for flow control in dataflow engines 水龙头:数据流引擎中用于流量控制的用户级模块化技术
Andrea Lattuada, Frank McSherry, Zaheer Chothia
{"title":"Faucet: a user-level, modular technique for flow control in dataflow engines","authors":"Andrea Lattuada, Frank McSherry, Zaheer Chothia","doi":"10.1145/2926534.2926544","DOIUrl":"https://doi.org/10.1145/2926534.2926544","url":null,"abstract":"This document presents Faucet, a modular flow control approach for distributed data-parallel dataflow engines with support for arbitrary (cyclic) topologies. When compared to existing backpressure techniques Faucet has the following differentiating characteristics: (i) the implementation only relies on existing progress information exposed by the system and does not require changes to the underlying dataflow system, (ii) it can be applied selectively to certain parts of the dataflow graph, and (iii) it is designed to support a wide variety of use cases, topologies and workloads. We demonstrate Faucet on an example computation for efficiently determining a cyclic join of relations, whose variability in rates of produced and consumed tuples challenges the flow control techniques employed by systems like Storm, Heron, and Spark. Our implementation, prototyped in Timely Dataflow, introduces flow control at critical locations in the computation, keeping the computation stable and resource-bound while introducing at most 20% runtime overhead over an unconstrained implementation. Our experience is that the information Timely Dataflow provides to user logic is sufficient for a variety of flow control and scheduling tasks, and merits further investigation.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125641213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Tight bounds on one- and two-pass MapReduce algorithms for matrix multiplication 矩阵乘法的一遍和两遍MapReduce算法的紧密边界
Prakash V. Ramanan, A. Nagar
{"title":"Tight bounds on one- and two-pass MapReduce algorithms for matrix multiplication","authors":"Prakash V. Ramanan, A. Nagar","doi":"10.1145/2926534.2926542","DOIUrl":"https://doi.org/10.1145/2926534.2926542","url":null,"abstract":"We study one- and two-pass mapReduce algorithms for multiplying two matrices. First, consider one-pass algorithms. In the literature, there is a tight bound for the tradeoff between communication cost and parallelism. It measures communication cost using the replication rate r, and measures parallelism by reducer size q. It gives a tight bound on qr for multiplying dense square matrices. We extend it in two different ways: First, to sparse rectangular matrices; second, to a different measure of parallelism, namely, reducer workload w. We present tight bounds on qr and wr2, for multiplying sparse rectangular matrices. We also show that the lower bound on qr follows from the lower bound on wr2; so, the lower bound on wr2 is stronger. Next, consider two-pass algorithms. It has been shown that, for a given reducer size, the two-pass algorithm has less communication cost than the one-pass algorithm. We present tight bounds on qfrfrs and wfr2frs, for multiplying dense rectangular matrices; the subscripts f and s correspond to the first and second pass, respectively. Also, using our bound on qfrfrs, we present a tight bound on the total communication cost as a function of qf. Our lower bounds hold for the class of two-pass algorithms that perform all the real number multiplications in the first pass.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121537439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DFA minimization in map-reduce map-reduce中的DFA最小化
G. Grahne, Shahab Harrafi, Iraj Hedayati, A. Moallemi
{"title":"DFA minimization in map-reduce","authors":"G. Grahne, Shahab Harrafi, Iraj Hedayati, A. Moallemi","doi":"10.1145/2926534.2926537","DOIUrl":"https://doi.org/10.1145/2926534.2926537","url":null,"abstract":"We describe Map-Reduce implementations of two of the most prominent DFA minimization methods, namely Moore's and Hopcroft's algorithms. Our analysis shows that the one based on Hopcroft's algorithm is more efficient, both in terms of running time and communication cost. This is validated by our extensive experiments on various types of DFA's, with up to 217 states. It also turns out that both algorithms are sensitive to skewed input, the Hopcroft's algorithm being intrinsically so.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132351969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Toward elastic memory management for cloud data analytics 面向云数据分析的弹性内存管理
Jingjing Wang, M. Balazinska
{"title":"Toward elastic memory management for cloud data analytics","authors":"Jingjing Wang, M. Balazinska","doi":"10.1145/2926534.2926541","DOIUrl":"https://doi.org/10.1145/2926534.2926541","url":null,"abstract":"We present several key elements towards elastic memory management in modern big data systems. The goal of our approach is to avoid out-of-memory failures without over-provisioning but also to avoid garbage-collection overheads when possible.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126956376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Model-centric computation abstractions in machine learning applications 机器学习应用中以模型为中心的计算抽象
Bingjing Zhang, Bo Peng, J. Qiu
{"title":"Model-centric computation abstractions in machine learning applications","authors":"Bingjing Zhang, Bo Peng, J. Qiu","doi":"10.1145/2926534.2926539","DOIUrl":"https://doi.org/10.1145/2926534.2926539","url":null,"abstract":"We categorize parallel machine learning applications into four types of computation models and propose a new set of model-centric computation abstractions. This work sets up parallel machine learning as a combination of training data-centric and model parameter-centric processing. The analysis uses Latent Dirichlet Allocation (LDA) as an example, and experimental results show that an efficient parallel model update pipeline can achieve similar or higher model convergence speed compared with other work.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127899899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Some pairs problems 一些配对问题
J. Ullman, Jonathan Ullman
{"title":"Some pairs problems","authors":"J. Ullman, Jonathan Ullman","doi":"10.1145/2926534.2926543","DOIUrl":"https://doi.org/10.1145/2926534.2926543","url":null,"abstract":"A common form of MapReduce application involves discovering relationships between certain pairs of inputs. Similarity joins serve as a good example of this type of problem, which we call a \"some-pairs\" problem. In the framework of [4], algorithms are measured by the tradeoff between reducer size (maximum number of inputs a reducer can handle) and the replication rate (average number of reducers to which an input must be sent. There are two obvious approaches to solving some-pairs problems in general. We show that no general-purpose MapReduce algorithm can beat both of these two algorithms in the worst case. We then explore a recursive algorithm for solving some-pairs problems and heuristics for beating the lower bound on common instances of the some-pairs class of problems.","PeriodicalId":393776,"journal":{"name":"Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122331611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信