Multilevel Parallelism for the Exploration of Large-Scale Graphs

Massimo Bernaschi;Mauro Bisson;Enrico Mastrostefano;Flavio Vella
{"title":"Multilevel Parallelism for the Exploration of Large-Scale Graphs","authors":"Massimo Bernaschi;Mauro Bisson;Enrico Mastrostefano;Flavio Vella","doi":"10.1109/TMSCS.2018.2797195","DOIUrl":null,"url":null,"abstract":"We present the most recent release of our parallel implementation of the BFS and BC algorithms for the study of large scale graphs. Although our reference platform is a high-end cluster of new generation Nvidia GPUs and some of our optimizations are CUDA specific, most of our ideas can be applied to other platforms offering multiple levels of parallelism. We exploit multi level parallel processing through a hybrid programming paradigm that combines highly tuned CUDA kernels, for the computations performed by each node, and explicit data exchange through the Message Passing Interface (MPI), for the communications among nodes. The results of the numerical experiments show that the performance of our code is comparable or better with respect to other state-of-the-art solutions. For the BFS, for instance, we reach a peak performance of 200 Giga Teps on a single GPU and 5.5 Terateps on 1024 Pascal GPUs. We release our source codes both for reproducing the results and for facilitating their usage as a building block for the implementation of other algorithms.","PeriodicalId":100643,"journal":{"name":"IEEE Transactions on Multi-Scale Computing Systems","volume":"4 3","pages":"204-216"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TMSCS.2018.2797195","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multi-Scale Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/8267334/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

We present the most recent release of our parallel implementation of the BFS and BC algorithms for the study of large scale graphs. Although our reference platform is a high-end cluster of new generation Nvidia GPUs and some of our optimizations are CUDA specific, most of our ideas can be applied to other platforms offering multiple levels of parallelism. We exploit multi level parallel processing through a hybrid programming paradigm that combines highly tuned CUDA kernels, for the computations performed by each node, and explicit data exchange through the Message Passing Interface (MPI), for the communications among nodes. The results of the numerical experiments show that the performance of our code is comparable or better with respect to other state-of-the-art solutions. For the BFS, for instance, we reach a peak performance of 200 Giga Teps on a single GPU and 5.5 Terateps on 1024 Pascal GPUs. We release our source codes both for reproducing the results and for facilitating their usage as a building block for the implementation of other algorithms.
探索大尺度图的多级并行性
我们介绍了用于研究大规模图的BFS和BC算法的并行实现的最新版本。尽管我们的参考平台是新一代英伟达GPU的高端集群,并且我们的一些优化是CUDA特有的,但我们的大多数想法都可以应用于其他提供多级并行性的平台。我们通过混合编程范式利用多级并行处理,该编程范式结合了高度调优的CUDA内核,用于每个节点执行的计算,以及通过消息传递接口(MPI)进行的显式数据交换,用于节点之间的通信。数值实验结果表明,我们的代码的性能与其他最先进的解决方案相当或更好。例如,对于BFS,我们在单个GPU上达到200吉比特的峰值性能,在1024 Pascal GPU上达到5.5兆比特。我们发布源代码既是为了重现结果,也是为了便于将其用作实现其他算法的构建块。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信