基于numa的SGI UV系统的可扩展图遍历

Proceedings of the ACM Workshop on High Performance Graph Processing Pub Date : 2016-05-31 DOI:10.1145/2915516.2915522

Yuichiro Yasui, K. Fujisawa, E. L. Goh, John Baron, A. Sugiura, Takashi Uchiyama

{"title":"基于numa的SGI UV系统的可扩展图遍历","authors":"Yuichiro Yasui, K. Fujisawa, E. L. Goh, John Baron, A. Sugiura, Takashi Uchiyama","doi":"10.1145/2915516.2915522","DOIUrl":null,"url":null,"abstract":"Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's direction-optimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 234 vertices and 238 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a shared-memory system on the current Graph500 list presented in November 2015, which includes our previous implementation.","PeriodicalId":20568,"journal":{"name":"Proceedings of the ACM Workshop on High Performance Graph Processing","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"NUMA-aware Scalable Graph Traversal on SGI UV Systems\",\"authors\":\"Yuichiro Yasui, K. Fujisawa, E. L. Goh, John Baron, A. Sugiura, Takashi Uchiyama\",\"doi\":\"10.1145/2915516.2915522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's direction-optimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 234 vertices and 238 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a shared-memory system on the current Graph500 list presented in November 2015, which includes our previous implementation.\",\"PeriodicalId\":20568,\"journal\":{\"name\":\"Proceedings of the ACM Workshop on High Performance Graph Processing\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Workshop on High Performance Graph Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2915516.2915522\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Workshop on High Performance Graph Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2915516.2915522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

广度优先搜索(BFS)是图论中最基本的处理算法之一。在此之前，我们提出了一种基于Beamer方向优化算法的可扩展BFS算法，用于基于非均匀内存访问(NUMA)的系统，该算法仔细考虑了NUMA架构。本文提出了一种从自顶向下的方向优化算法来减少远程内存访问的新实现。我们还讨论了在SGI UV 2000和UV 300系统上获得的数值结果，这两种系统是基于缓存一致(cc)-NUMA架构的共享内存超级计算机，可以在单个操作系统上处理数千个线程。我们的实现在一个拥有234个顶点和238条边的Kronecker图上实现了每秒2190亿条边的性能，该图在SGI UV 300系统的机架上拥有1,152个线程。这个结果超过了2015年11月发布的当前Graph500列表中共享内存系统的最快条目，其中包括我们之前的实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NUMA-aware Scalable Graph Traversal on SGI UV Systems

Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's direction-optimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 234 vertices and 238 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a shared-memory system on the current Graph500 list presented in November 2015, which includes our previous implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM Workshop on High Performance Graph Processing

自引率

0.00%

发文量