Round compression for parallel matching algorithms

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing Pub Date : 2017-07-11 DOI:10.1145/3188745.3188764

A. Czumaj, Jakub Lacki, A. Madry, Slobodan Mitrovic, Krzysztof Onak, P. Sankowski

{"title":"Round compression for parallel matching algorithms","authors":"A. Czumaj, Jakub Lacki, A. Madry, Slobodan Mitrovic, Krzysztof Onak, P. Sankowski","doi":"10.1145/3188745.3188764","DOIUrl":null,"url":null,"abstract":"For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem—one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n1+Ω(1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(logn) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that possibility. That is, we break the above O(logn) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (2+є)-approximate maximum matching, for any fixed constant є>0, in O((loglogn)2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex–based graph partitioning, instead of the edge–based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(logn) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":"90 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3188745.3188764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 93

Abstract

For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem—one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n1+Ω(1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(logn) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that possibility. That is, we break the above O(logn) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (2+є)-approximate maximum matching, for any fixed constant є>0, in O((loglogn)2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex–based graph partitioning, instead of the edge–based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(logn) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds.

查看原文本刊更多论文

并行匹配算法的轮压缩

十多年来，我们见证了大规模并行计算(MPC)框架的成功，比如MapReduce、Hadoop、Dryad或Spark。它们成功的原因之一是这些框架能够准确地捕捉大规模计算的本质。特别是，与经典的分布式算法或PRAM模型相比，这些框架允许更多的本地计算。然而，在这种情况下出现的基本问题是:我们能否利用这种额外的能力来获得更快的并行算法?这里一个突出的例子是最大匹配问题——最经典的图问题之一。众所周知，在PRAM模型中，可以在O(logn)轮中计算出2-近似最大匹配。然而，在MPC框架中，这个问题的确切复杂性仍远未被理解。Lattanzi et al. (SPAA 2011)表明，如果每台机器有n1+Ω(1)的内存，这个问题也可以在一个恒定的轮数中近似地解决。这些技术，以及在后续工作中开发的方法，似乎在大约O(logn)轮的基本方式中陷入困境，一旦我们进入(最多)近线性内存状态。因此，完全有可能在这种情况下，特别是在稀疏图计算的情况下，最佳MPC轮复杂度与PRAM模型中已经可以得到的复杂度相匹配，而不需要利用额外的局部计算能力。在本文中，我们最终驳斥了这种可能性。也就是说，即使在每台机器的内存略低于线性的情况下，我们也打破了上述O(logn)的复杂度界限。事实上，我们在这里的改进几乎是指数级的:我们能够在O((loglogn)2)轮中提供(2+ n) -近似最大匹配，对于任何固定常数n >0。为了建立我们的结果，我们需要在两个重要的方面偏离之前的工作，这对于利用MPC模型的力量至关重要，与PRAM模型相比。首先，我们使用基于顶点的图划分，而不是迄今为止使用的基于边缘的方法。其次，我们开发了一种圆压缩技术。这种技术使人们能够采用一种(分布式)算法，在O(logn)独立的PRAM阶段中计算最大匹配的O(1)近似值，并在仅常数次MPC轮中实现这些阶段的超常数次。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

自引率

0.00%

发文量