Yuichiro Yasui, K. Fujisawa, E. L. Goh, John Baron, A. Sugiura, Takashi Uchiyama
{"title":"基于numa的SGI UV系统的可扩展图遍历","authors":"Yuichiro Yasui, K. Fujisawa, E. L. Goh, John Baron, A. Sugiura, Takashi Uchiyama","doi":"10.1145/2915516.2915522","DOIUrl":null,"url":null,"abstract":"Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's direction-optimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 234 vertices and 238 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a shared-memory system on the current Graph500 list presented in November 2015, which includes our previous implementation.","PeriodicalId":20568,"journal":{"name":"Proceedings of the ACM Workshop on High Performance Graph Processing","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"NUMA-aware Scalable Graph Traversal on SGI UV Systems\",\"authors\":\"Yuichiro Yasui, K. Fujisawa, E. L. Goh, John Baron, A. Sugiura, Takashi Uchiyama\",\"doi\":\"10.1145/2915516.2915522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's direction-optimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 234 vertices and 238 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a shared-memory system on the current Graph500 list presented in November 2015, which includes our previous implementation.\",\"PeriodicalId\":20568,\"journal\":{\"name\":\"Proceedings of the ACM Workshop on High Performance Graph Processing\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Workshop on High Performance Graph Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2915516.2915522\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Workshop on High Performance Graph Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2915516.2915522","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
NUMA-aware Scalable Graph Traversal on SGI UV Systems
Breadth-first search (BFS) is one of the most fundamental processing algorithms in graph theory. We previously presented a scalable BFS algorithm based on Beamer's direction-optimizing algorithm for non-uniform memory access (NUMA)-based systems, in which the NUMA architecture was carefully considered. This paper presents our new implementation that reduces remote memory access in a top-down direction of direction-optimizing algorithm. We also discuss numerical results obtained on the SGI UV 2000 and UV 300 systems, which are shared-memory supercomputers based on a cache coherent (cc)-NUMA architecture that can handle thousands of threads on a single operating system. Our implementation has achieved performance rates of 219 billion edges per second on a Kronecker graph with 234 vertices and 238 edges on a rack of an SGI UV 300 system with 1,152 threads. This result exceeds the fastest entry for a shared-memory system on the current Graph500 list presented in November 2015, which includes our previous implementation.