Round compression for parallel matching algorithms

A. Czumaj, Jakub Lacki, A. Madry, Slobodan Mitrovic, Krzysztof Onak, P. Sankowski
{"title":"Round compression for parallel matching algorithms","authors":"A. Czumaj, Jakub Lacki, A. Madry, Slobodan Mitrovic, Krzysztof Onak, P. Sankowski","doi":"10.1145/3188745.3188764","DOIUrl":null,"url":null,"abstract":"For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem—one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n1+Ω(1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(logn) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that possibility. That is, we break the above O(logn) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (2+є)-approximate maximum matching, for any fixed constant є>0, in O((loglogn)2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex–based graph partitioning, instead of the edge–based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(logn) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds.","PeriodicalId":20593,"journal":{"name":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3188745.3188764","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 93

Abstract

For over a decade now we have been witnessing the success of massive parallel computation (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the maximum matching problem—one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in O(logn) rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. (SPAA 2011) showed that if each machine has n1+Ω(1) memory, this problem can also be solved 2-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly O(logn) rounds once we enter the (at most) near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that possibility. That is, we break the above O(logn) round complexity bound even in the case of slightly sublinear memory per machine. In fact, our improvement here is almost exponential: we are able to deliver a (2+є)-approximate maximum matching, for any fixed constant є>0, in O((loglogn)2) rounds. To establish our result we need to deviate from the previous work in two important ways that are crucial for exploiting the power of the MPC model, as compared to the PRAM model. Firstly, we use vertex–based graph partitioning, instead of the edge–based approaches that were utilized so far. Secondly, we develop a technique of round compression. This technique enables one to take a (distributed) algorithm that computes an O(1)-approximation of maximum matching in O(logn) independent PRAM phases and implement a super-constant number of these phases in only a constant number of MPC rounds.
并行匹配算法的轮压缩
十多年来,我们见证了大规模并行计算(MPC)框架的成功,比如MapReduce、Hadoop、Dryad或Spark。它们成功的原因之一是这些框架能够准确地捕捉大规模计算的本质。特别是,与经典的分布式算法或PRAM模型相比,这些框架允许更多的本地计算。然而,在这种情况下出现的基本问题是:我们能否利用这种额外的能力来获得更快的并行算法?这里一个突出的例子是最大匹配问题——最经典的图问题之一。众所周知,在PRAM模型中,可以在O(logn)轮中计算出2-近似最大匹配。然而,在MPC框架中,这个问题的确切复杂性仍远未被理解。Lattanzi et al. (SPAA 2011)表明,如果每台机器有n1+Ω(1)的内存,这个问题也可以在一个恒定的轮数中近似地解决。这些技术,以及在后续工作中开发的方法,似乎在大约O(logn)轮的基本方式中陷入困境,一旦我们进入(最多)近线性内存状态。因此,完全有可能在这种情况下,特别是在稀疏图计算的情况下,最佳MPC轮复杂度与PRAM模型中已经可以得到的复杂度相匹配,而不需要利用额外的局部计算能力。在本文中,我们最终驳斥了这种可能性。也就是说,即使在每台机器的内存略低于线性的情况下,我们也打破了上述O(logn)的复杂度界限。事实上,我们在这里的改进几乎是指数级的:我们能够在O((loglogn)2)轮中提供(2+ n) -近似最大匹配,对于任何固定常数n >0。为了建立我们的结果,我们需要在两个重要的方面偏离之前的工作,这对于利用MPC模型的力量至关重要,与PRAM模型相比。首先,我们使用基于顶点的图划分,而不是迄今为止使用的基于边缘的方法。其次,我们开发了一种圆压缩技术。这种技术使人们能够采用一种(分布式)算法,在O(logn)独立的PRAM阶段中计算最大匹配的O(1)近似值,并在仅常数次MPC轮中实现这些阶段的超常数次。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信