Hybrid BFS Approach Using Semi-external Memory

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI:10.1109/IPDPSW.2014.189

Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, K. Fujisawa, S. Matsuoka

{"title":"Hybrid BFS Approach Using Semi-external Memory","authors":"Keita Iwabuchi, Hitoshi Sato, Ryo Mizote, Yuichiro Yasui, K. Fujisawa, S. Matsuoka","doi":"10.1109/IPDPSW.2014.189","DOIUrl":null,"url":null,"abstract":"NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18% performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

NVM devices will greatly expand the possibility of processing extremely large-scale graphs that exceed the DRAM capacity of the nodes, however, efficient implementation based on detailed performance analysis of access patterns of unstructured graph kernel on systems that utilize a mixture of DRAM and NVM devices has not been well investigated. We introduce a graph data offloading technique using NVMs that augment the hybrid BFS (Breadth-first search) algorithm widely used in the Graph500 benchmark, and conduct performance analysis to demonstrate the utility of NVMs for unstructured data. Experimental results of a Scale27 problem of a Kronecker graph compliant to the Graph500 benchmark show that our approach maximally sustains 4.22 Giga TEPS (Traversed Edges Per Second), reducing DRAM size by half with only 19.18% performance degradation on a 4-way AMD Opteron 6172 machine heavily equipped with NVM devices. Although direct comparison is difficult, this is significantly greater than the result of 0.05 GTEPS for a SCALE 36 problem by using 1TB of DRAM and 12 TB of NVM as reported by Pearce et al. Although our approach uses higher DRAM to NVM ratio, we show that a good compromise is achievable between performance vs. capacity ratio for processing large-scale graphs. This result as well as detailed performance analysis of the proposed technique suggests that we can process extremely large-scale graphs per node with minimum performance degradation by carefully considering the data structures of a given graph and the access patterns to both DRAM and NVM devices. As a result, our implementation has achieved 4.35 MTEPS/W(Mega TEPS per Watt) and ranked 4th on November 2013 edition of the Green Graph500 list in the Big Data category by using only a single fat server heavily equipped with NVMs.

查看原文本刊更多论文

基于半外部存储器的混合BFS方法

NVM设备将极大地扩展处理超过节点DRAM容量的超大规模图形的可能性，然而，基于对混合使用DRAM和NVM设备的系统上非结构化图形内核访问模式的详细性能分析的有效实现尚未得到很好的研究。我们介绍了一种使用nvm的图数据卸载技术，该技术增强了Graph500基准测试中广泛使用的混合BFS(广度优先搜索)算法，并进行了性能分析，以证明nvm对非结构化数据的实用性。符合Graph500基准的Kronecker图的Scale27问题的实验结果表明，我们的方法最大限度地维持4.22 Giga TEPS(每秒遍行边缘)，在配备NVM设备的4路AMD Opteron 6172机器上，DRAM尺寸减少了一半，性能下降仅为19.18%。虽然直接比较比较困难，但这比Pearce等人报告的使用1TB DRAM和12tb NVM的SCALE 36问题0.05 GTEPS的结果要大得多。虽然我们的方法使用更高的DRAM与NVM的比率，但我们表明，在处理大规模图形时，性能与容量比率之间可以实现一个很好的折衷。这一结果以及对所提出技术的详细性能分析表明，通过仔细考虑给定图的数据结构以及对DRAM和NVM设备的访问模式，我们可以在每个节点以最小的性能降低来处理超大规模的图。结果，我们的实现实现了4.35 MTEPS/W(兆TEPS/瓦)，并在2013年11月版的Green Graph500大数据类别中排名第四，仅使用了一台大量配备nvm的大型服务器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Parallel & Distributed Processing Symposium Workshops

自引率

0.00%

发文量