Correlation clustering in MapReduce

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2014-08-24 DOI:10.1145/2623330.2623743

Flavio Chierichetti, Nilesh N. Dalvi, Ravi Kumar

引用次数: 73

Abstract

Correlation clustering is a basic primitive in data miner's toolkit with applications ranging from entity matching to social network analysis. The goal in correlation clustering is, given a graph with signed edges, partition the nodes into clusters to minimize the number of disagreements. In this paper we obtain a new algorithm for correlation clustering. Our algorithm is easily implementable in computational models such as MapReduce and streaming, and runs in a small number of rounds. In addition, we show that our algorithm obtains an almost 3-approximation to the optimal correlation clustering. Experiments on huge graphs demonstrate the scalability of our algorithm and its applicability to data mining problems.

查看原文本刊更多论文

MapReduce中的相关聚类

关联聚类是数据挖掘工具包中的一个基本元素，其应用范围从实体匹配到社会网络分析。关联聚类的目标是，给定一个有符号边的图，将节点划分成簇，以最小化不一致的数量。本文提出了一种新的相关聚类算法。我们的算法很容易在MapReduce和streaming等计算模型中实现，并且以少量的轮数运行。此外，我们还证明了我们的算法获得了最优相关聚类的接近3的近似。在大型图上的实验证明了算法的可扩展性和对数据挖掘问题的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量