Randomized Composable Coresets for Matching and Vertex Cover

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-05-23 DOI:10.1145/3087556.3087581

Sepehr Assadi, S. Khanna

{"title":"Randomized Composable Coresets for Matching and Vertex Cover","authors":"Sepehr Assadi, S. Khanna","doi":"10.1145/3087556.3087581","DOIUrl":null,"url":null,"abstract":"A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say k, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous communication model whereby each machine constructs a small representative summary of its own data and one obtains an approximate/exact solution from the union of the representative summaries. If the representative summaries needed for a problem are small, then this results in a communication-efficient and \\emph{round-optimal} (requiring essentially no interaction between the machines) protocol. Some well-known examples of techniques for creating summaries include sampling, linear sketching, and composable coresets. These techniques have been successfully used to design communication efficient solutions for many fundamental graph problems. However, two prominent problems are notably absent from the list of successes, namely, the maximum matching problem and the minimum vertex cover problem. Indeed, it was shown recently that for both these problems, even achieving a modest approximation factor of \\polylog{(n)} requires using representative summaries of size \\widetilde{\\Omega}(n^2) i.e. essentially no better summary exists than each machine simply sending its entire input graph. The main insight of our work is that the intractability of matching and vertex cover in the simultaneous communication model is inherently connected to an adversarial partitioning of the underlying graph across machines. We show that when the underlying graph is randomly partitioned across machines, both these problems admit \\emph{randomized composable coresets} of size \\widetilde{O}(n) that yield an \\widetilde{O}(1)-approximate solution\\footnote{Here and throughout the paper, we use \\Ot(\\cdot) notation to suppress \\polylog{(n)} factors, where n is the number of vertices in the graph. In other words, a small subgraph of the input graph at each machine can be identified as its representative summary and the final answer then is obtained by simply running any maximum matching or minimum vertex cover algorithm on these combined subgraphs. This results in an Õ(1)-approximation simultaneous protocol for these problems with Õ(nk) total communication when the input is randomly partitioned across k machines. We also prove our results are optimal in a very strong sense: we not only rule out existence of smaller randomized composable coresets for these problems but in fact show that our \\Ot(nk) bound for total communication is optimal for em any simultaneous communication protocol (i.e. not only for randomized coresets) for these two problems. Finally, by a standard application of composable coresets, our results also imply MapReduce algorithms with the same approximation guarantee in one or two rounds of communication, improving the previous best known round complexity for these problems.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3087556.3087581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 49

Abstract

A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say k, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous communication model whereby each machine constructs a small representative summary of its own data and one obtains an approximate/exact solution from the union of the representative summaries. If the representative summaries needed for a problem are small, then this results in a communication-efficient and \emph{round-optimal} (requiring essentially no interaction between the machines) protocol. Some well-known examples of techniques for creating summaries include sampling, linear sketching, and composable coresets. These techniques have been successfully used to design communication efficient solutions for many fundamental graph problems. However, two prominent problems are notably absent from the list of successes, namely, the maximum matching problem and the minimum vertex cover problem. Indeed, it was shown recently that for both these problems, even achieving a modest approximation factor of \polylog{(n)} requires using representative summaries of size \widetilde{\Omega}(n^2) i.e. essentially no better summary exists than each machine simply sending its entire input graph. The main insight of our work is that the intractability of matching and vertex cover in the simultaneous communication model is inherently connected to an adversarial partitioning of the underlying graph across machines. We show that when the underlying graph is randomly partitioned across machines, both these problems admit \emph{randomized composable coresets} of size \widetilde{O}(n) that yield an \widetilde{O}(1)-approximate solution\footnote{Here and throughout the paper, we use \Ot(\cdot) notation to suppress \polylog{(n)} factors, where n is the number of vertices in the graph. In other words, a small subgraph of the input graph at each machine can be identified as its representative summary and the final answer then is obtained by simply running any maximum matching or minimum vertex cover algorithm on these combined subgraphs. This results in an Õ(1)-approximation simultaneous protocol for these problems with Õ(nk) total communication when the input is randomly partitioned across k machines. We also prove our results are optimal in a very strong sense: we not only rule out existence of smaller randomized composable coresets for these problems but in fact show that our \Ot(nk) bound for total communication is optimal for em any simultaneous communication protocol (i.e. not only for randomized coresets) for these two problems. Finally, by a standard application of composable coresets, our results also imply MapReduce algorithms with the same approximation guarantee in one or two rounds of communication, improving the previous best known round complexity for these problems.

查看原文本刊更多论文

匹配和顶点覆盖的随机可组合核心集

为大规模数据集设计可扩展算法的一种常见方法是将计算分布在k台机器上，并使用它们之间有限的通信来处理数据。这里一个特别吸引人的框架是同步通信模型，其中每台机器构建自己数据的小型代表性摘要，并从代表性摘要的联合中获得近似/精确解。如果问题所需的代表性摘要很小，那么这将导致通信效率高且最优(基本上不需要机器之间的交互)的协议。一些著名的创建摘要的技术示例包括采样、线性草图和可组合核心集。这些技术已被成功地用于为许多基本图问题设计有效的通信解决方案。然而，值得注意的是，成功列表中没有两个突出的问题，即最大匹配问题和最小顶点覆盖问题。事实上，最近的研究表明，对于这两个问题，即使达到一个适当的近似因子\polylog{(n)}，也需要使用大小为\ widdetilde {\Omega}(n^2)的代表性摘要，即基本上没有比每台机器简单地发送其整个输入图更好的摘要了。我们工作的主要见解是，同步通信模型中匹配和顶点覆盖的难处本质上与跨机器的底层图的对抗性划分有关。我们表明，当底层图在机器上随机划分时，这两个问题都承认\emph{随机可组合核心集}的大小为\ widdetilde {O}(n)，产生\ widdetilde {O}(1)-近似解\脚注{在这里和整个论文中，我们使用\Ot(\cdot)符号来抑制\polylog{(n)}因子，其中n是图中的顶点数。换句话说，每台机器上输入图的一个小子图可以被识别为它的代表性摘要，然后通过简单地在这些组合子图上运行任何最大匹配或最小顶点覆盖算法来获得最终答案。这就产生了一个Õ(1)近似的同时协议，当输入被随机分配到k台机器上时，该协议具有Õ(nk)的总通信。我们还证明了我们的结果在很强的意义上是最优的:我们不仅排除了这些问题中更小的随机可组合核心集的存在，而且实际上表明，对于这两个问题，我们的总通信的\Ot(nk)界对于任何同时通信协议(即不仅对于随机核心集)都是最优的。最后，通过可组合核心集的标准应用，我们的结果也意味着MapReduce算法在一到两轮通信中具有相同的近似保证，提高了这些问题之前最著名的轮复杂度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures

自引率

0.00%

发文量