Performance characteristics of Graph500 on large-scale distributed environment

T. Suzumura, Koji Ueno, Hitoshi Sato, K. Fujisawa, S. Matsuoka
{"title":"Performance characteristics of Graph500 on large-scale distributed environment","authors":"T. Suzumura, Koji Ueno, Hitoshi Sato, K. Fujisawa, S. Matsuoka","doi":"10.1109/IISWC.2011.6114175","DOIUrl":null,"url":null,"abstract":"Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation “replicated-csr” based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2011.6114175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 59

Abstract

Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation “replicated-csr” based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.
Graph500在大规模分布式环境下的性能特征
Graph500是基于大规模图分析的超级计算机的新基准,它正在成为许多实际应用中的重要分析形式。图形算法在具有共享内存的超级计算机上运行良好。对于基于linpack的超级计算机排名,TOP500报告称,拥有大量gpgpu的异构和分布式内存超级计算机正在占据主导地位。然而,到目前为止,分布式内存超级计算机上的大规模图形分析基准(如Graph500)的性能特征还很少得到研究。这是在基于商用处理器的分布式内存超级计算机上对Graph500进行性能评估和分析的第一份报告。我们发现,基于分布式水平同步宽度优先搜索的参考实现“replicated-csr”在3.09秒内解决了一个包含231个顶点和235个边(约21.5亿个顶点和343亿个边)的大型自由图问题,拥有128个节点和3,072个内核。这相当于每秒遍历11gb的边。我们描述了Graph500参考实现的算法和实现,并分析了不同图大小和计算机节点数量以及不同实现时的性能特征。我们的结果也将有助于为即将到来的百亿亿次机器开发优化算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信