Scalable implementation of dependence clustering in Apache Spark

2017 Evolving and Adaptive Intelligent Systems (EAIS) Pub Date : 2017-05-01 DOI:10.1109/EAIS.2017.7954843

E. Ivannikova

引用次数: 5

Abstract

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs.

查看原文本刊更多论文

Apache Spark中依赖集群的可伸缩实现

本文提出了一种可扩展版本的依赖聚类算法，该算法属于谱聚类方法。该方法在Apache Spark中使用GraphX API原语实现。此外，还介绍了在Spark环境下实现光谱聚类算法的快速近似扩散过程。此外，该算法还与谱聚类进行了基准测试。将该方法应用于实际数据的结果表明，该实现具有良好的可伸缩性，并且在密集连接图上表现出良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 Evolving and Adaptive Intelligent Systems (EAIS)

自引率

0.00%

发文量