Multilevel Parallelism for the Exploration of Large-Scale Graphs

IEEE Transactions on Multi-Scale Computing Systems Pub Date : 2018-01-23 DOI:10.1109/TMSCS.2018.2797195

Massimo Bernaschi;Mauro Bisson;Enrico Mastrostefano;Flavio Vella

{"title":"Multilevel Parallelism for the Exploration of Large-Scale Graphs","authors":"Massimo Bernaschi;Mauro Bisson;Enrico Mastrostefano;Flavio Vella","doi":"10.1109/TMSCS.2018.2797195","DOIUrl":null,"url":null,"abstract":"We present the most recent release of our parallel implementation of the BFS and BC algorithms for the study of large scale graphs. Although our reference platform is a high-end cluster of new generation Nvidia GPUs and some of our optimizations are CUDA specific, most of our ideas can be applied to other platforms offering multiple levels of parallelism. We exploit multi level parallel processing through a hybrid programming paradigm that combines highly tuned CUDA kernels, for the computations performed by each node, and explicit data exchange through the Message Passing Interface (MPI), for the communications among nodes. The results of the numerical experiments show that the performance of our code is comparable or better with respect to other state-of-the-art solutions. For the BFS, for instance, we reach a peak performance of 200 Giga Teps on a single GPU and 5.5 Terateps on 1024 Pascal GPUs. We release our source codes both for reproducing the results and for facilitating their usage as a building block for the implementation of other algorithms.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"204-216"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2797195","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/8267334/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

We present the most recent release of our parallel implementation of the BFS and BC algorithms for the study of large scale graphs. Although our reference platform is a high-end cluster of new generation Nvidia GPUs and some of our optimizations are CUDA specific, most of our ideas can be applied to other platforms offering multiple levels of parallelism. We exploit multi level parallel processing through a hybrid programming paradigm that combines highly tuned CUDA kernels, for the computations performed by each node, and explicit data exchange through the Message Passing Interface (MPI), for the communications among nodes. The results of the numerical experiments show that the performance of our code is comparable or better with respect to other state-of-the-art solutions. For the BFS, for instance, we reach a peak performance of 200 Giga Teps on a single GPU and 5.5 Terateps on 1024 Pascal GPUs. We release our source codes both for reproducing the results and for facilitating their usage as a building block for the implementation of other algorithms.

查看原文本刊更多论文

探索大尺度图的多级并行性

我们介绍了用于研究大规模图的BFS和BC算法的并行实现的最新版本。尽管我们的参考平台是新一代英伟达GPU的高端集群，并且我们的一些优化是CUDA特有的，但我们的大多数想法都可以应用于其他提供多级并行性的平台。我们通过混合编程范式利用多级并行处理，该编程范式结合了高度调优的CUDA内核，用于每个节点执行的计算，以及通过消息传递接口（MPI）进行的显式数据交换，用于节点之间的通信。数值实验结果表明，我们的代码的性能与其他最先进的解决方案相当或更好。例如，对于BFS，我们在单个GPU上达到200吉比特的峰值性能，在1024 Pascal GPU上达到5.5兆比特。我们发布源代码既是为了重现结果，也是为了便于将其用作实现其他算法的构建块。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multi-Scale Computing Systems

自引率

0.00%

发文量