Performance characteristics of Graph500 on large-scale distributed environment

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI:10.1109/IISWC.2011.6114175

T. Suzumura, Koji Ueno, Hitoshi Sato, K. Fujisawa, S. Matsuoka

{"title":"Performance characteristics of Graph500 on large-scale distributed environment","authors":"T. Suzumura, Koji Ueno, Hitoshi Sato, K. Fujisawa, S. Matsuoka","doi":"10.1109/IISWC.2011.6114175","DOIUrl":null,"url":null,"abstract":"Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation “replicated-csr” based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2011.6114175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 59

Abstract

Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation “replicated-csr” based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.

查看原文本刊更多论文

Graph500在大规模分布式环境下的性能特征

Graph500是基于大规模图分析的超级计算机的新基准，它正在成为许多实际应用中的重要分析形式。图形算法在具有共享内存的超级计算机上运行良好。对于基于linpack的超级计算机排名，TOP500报告称，拥有大量gpgpu的异构和分布式内存超级计算机正在占据主导地位。然而，到目前为止，分布式内存超级计算机上的大规模图形分析基准(如Graph500)的性能特征还很少得到研究。这是在基于商用处理器的分布式内存超级计算机上对Graph500进行性能评估和分析的第一份报告。我们发现，基于分布式水平同步宽度优先搜索的参考实现“replicated-csr”在3.09秒内解决了一个包含231个顶点和235个边(约21.5亿个顶点和343亿个边)的大型自由图问题，拥有128个节点和3,072个内核。这相当于每秒遍历11gb的边。我们描述了Graph500参考实现的算法和实现，并分析了不同图大小和计算机节点数量以及不同实现时的性能特征。我们的结果也将有助于为即将到来的百亿亿次机器开发优化算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量