Randomized Composable Core-sets for Distributed Submodular Maximization

Proceedings of the forty-seventh annual ACM symposium on Theory of Computing Pub Date : 2015-06-14 DOI:10.1145/2746539.2746624

V. Mirrokni, Morteza Zadimoghaddam

{"title":"Randomized Composable Core-sets for Distributed Submodular Maximization","authors":"V. Mirrokni, Morteza Zadimoghaddam","doi":"10.1145/2746539.2746624","DOIUrl":null,"url":null,"abstract":"An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems [7,15,8]. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique [15]. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. The effectiveness of this technique has been confirmed empirically for several machine learning applications [22], and our proof provides a theoretical foundation to this idea. In summary, we show that a simple greedy algorithm results in a 1/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with O(n) total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy, we present an improved 0.545-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor 1/2 in a constant number of rounds.","PeriodicalId":20566,"journal":{"name":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","volume":"48 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"120","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the forty-seventh annual ACM symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2746539.2746624","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 120

Abstract

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece and compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. This technique can be captured via the concept of composable core-sets, and has been recently applied to solve diversity maximization problems as well as several clustering problems [7,15,8]. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique [15]. In this paper, we focus on efficient construction of a randomized variant of composable core-sets where the above idea is applied on a random clustering of the data. We employ this technique for the coverage, monotone and non-monotone submodular maximization problems. Our results significantly improve upon the hardness results for non-randomized core-sets, and imply improved results for submodular maximization in a distributed and streaming settings. The effectiveness of this technique has been confirmed empirically for several machine learning applications [22], and our proof provides a theoretical foundation to this idea. In summary, we show that a simple greedy algorithm results in a 1/3-approximate randomized composable core-set for submodular maximization under a cardinality constraint. Our result also extends to non-monotone submodular functions, and leads to the first 2-round MapReduce-based constant-factor approximation algorithm with O(n) total communication complexity for either monotone or non-monotone functions. Finally, using an improved analysis technique and a new algorithm PseudoGreedy, we present an improved 0.545-approximation algorithm for monotone submodular maximization, which is in turn the first MapReduce-based algorithm beating factor 1/2 in a constant number of rounds.

查看原文本刊更多论文

分布式子模最大化的随机可组合核集

求解大规模数据集上的优化问题的一种有效技术是将数据分割成更小的块，在每个块上求解问题并从中计算一个代表性解，最后在所有块的代表性解的并集中得到一个解。这种技术可以通过可组合核心集的概念来捕获，并且最近已被应用于解决多样性最大化问题以及几个聚类问题[7,15,8]。然而，对于覆盖和次模最大化问题，这种技术的不可能界是已知的[15]。在本文中，我们重点研究了可组合核心集的随机变体的有效构造，其中将上述思想应用于数据的随机聚类。我们将此技术应用于复盖、单调和非单调次模最大化问题。我们的结果显著改善了非随机核心集的硬度结果，并暗示了在分布式和流设置中改进了子模最大化的结果。该技术的有效性已经在几个机器学习应用中得到了实证证实[22]，我们的证明为这一想法提供了理论基础。综上所述，我们证明了在基数约束下，一个简单的贪心算法可以得到一个1/3近似的随机可组合核心集，用于次模最大化。我们的结果也扩展到非单调子模函数，并导致第一个基于2轮mapreduce的常因子近似算法，对于单调或非单调函数，其总通信复杂度为O(n)。最后，利用改进的分析技术和新算法PseudoGreedy，我们提出了一种改进的0.545近似算法，用于单调次模最大化，这是第一个基于mapreduce的算法在常数轮数中击败因子1/2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

自引率

0.00%

发文量