Distributed, Shared-Memory Parallel Triangle Counting

Proceedings of the Platform for Advanced Scientific Computing Conference Pub Date : 2018-07-02 DOI:10.1145/3218176.3218229

Thejaka Amila Kanewala, Marcin Zalewski, A. Lumsdaine

{"title":"Distributed, Shared-Memory Parallel Triangle Counting","authors":"Thejaka Amila Kanewala, Marcin Zalewski, A. Lumsdaine","doi":"10.1145/3218176.3218229","DOIUrl":null,"url":null,"abstract":"Triangles are the most basic non-trivial subgraphs. Triangle counting is used in a number of different applications, including social network mining, cyber security, and spam detection. In general, triangle counting algorithms are readily parallelizable, but when implemented in distributed, shared-memory, their performance is poor due to high communication, imbalance of work, and the difficulty of exploiting locality available in shared memory. In this paper, we discuss four different (but related) triangle counting algorithms and how their performance can be improved in distributed, shared-memory by reducing in-node load imbalance, improving cache utilization, minimizing network overhead, and minimizing algorithmic work. We generalize the four different triangle counting algorithms into a common framework and show that for all four algorithms the in-node load imbalance can be minimized while utilizing caches by partitioning work into blocks of vertices, the network overhead can be minimized by aggregation of blocks of work, and algorithm work can be reduced by partitioning vertex neighbors by degree. We experimentally evaluate the weak and the strong scaling performance of the proposed algorithms with two types of synthetic graph inputs and three real-world graph inputs. We also compare the performance of our implementations with the distributed, shared-memory triangle counting algorithms available in PowerGraph-GraphLab and show that our proposed algorithms outperform those algorithms, both in terms of space and time.","PeriodicalId":174137,"journal":{"name":"Proceedings of the Platform for Advanced Scientific Computing Conference","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Platform for Advanced Scientific Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3218176.3218229","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Triangles are the most basic non-trivial subgraphs. Triangle counting is used in a number of different applications, including social network mining, cyber security, and spam detection. In general, triangle counting algorithms are readily parallelizable, but when implemented in distributed, shared-memory, their performance is poor due to high communication, imbalance of work, and the difficulty of exploiting locality available in shared memory. In this paper, we discuss four different (but related) triangle counting algorithms and how their performance can be improved in distributed, shared-memory by reducing in-node load imbalance, improving cache utilization, minimizing network overhead, and minimizing algorithmic work. We generalize the four different triangle counting algorithms into a common framework and show that for all four algorithms the in-node load imbalance can be minimized while utilizing caches by partitioning work into blocks of vertices, the network overhead can be minimized by aggregation of blocks of work, and algorithm work can be reduced by partitioning vertex neighbors by degree. We experimentally evaluate the weak and the strong scaling performance of the proposed algorithms with two types of synthetic graph inputs and three real-world graph inputs. We also compare the performance of our implementations with the distributed, shared-memory triangle counting algorithms available in PowerGraph-GraphLab and show that our proposed algorithms outperform those algorithms, both in terms of space and time.

查看原文本刊更多论文

分布式、共享内存并行三角形计数

三角形是最基本的非平凡子图。三角计数在许多不同的应用程序中使用，包括社交网络挖掘、网络安全和垃圾邮件检测。一般来说，三角形计数算法很容易实现并行化，但是当在分布式共享内存中实现时，由于高通信、工作不平衡以及难以利用共享内存中的局部性，它们的性能很差。在本文中，我们讨论了四种不同的(但相关的)三角形计数算法，以及如何通过减少节点内负载不平衡、提高缓存利用率、最小化网络开销和最小化算法工作来提高它们在分布式共享内存中的性能。我们将四种不同的三角形计数算法推广到一个共同的框架中，并表明对于所有四种算法，通过将工作划分为顶点块可以在利用缓存时最小化节点内负载不平衡，通过将工作块聚合可以最小化网络开销，通过对顶点邻居进行度划分可以减少算法工作量。我们用两种类型的合成图输入和三种真实世界的图输入实验评估了所提出算法的弱和强缩放性能。我们还将实现的性能与PowerGraph-GraphLab中可用的分布式共享内存三角形计数算法进行了比较，并表明我们提出的算法在空间和时间方面都优于这些算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Platform for Advanced Scientific Computing Conference

自引率

0.00%

发文量