{"title":"利用图结构的分布式内存三角形计数","authors":"Sayan Ghosh, M. Halappanavar","doi":"10.1109/HPEC43674.2020.9286167","DOIUrl":null,"url":null,"abstract":"Graph analytics has emerged as an important tool in the analysis of large scale data from diverse application domains such as social networks, cyber security and bioinformatics. Counting the number of triangles in a graph is a fundamental kernel with several applications such as detecting the community structure of a graph or in identifying important vertices in a graph. The ubiquity of massive datasets is driving the need to scale graph analytics on parallel systems. However, numerous challenges exist in efficiently parallelizing graph algorithms, especially on distributed-memory systems. Irregular memory accesses and communication patterns, low computation to communication ratios, and the need for frequent synchronization are some of the leading challenges. In this paper, we present TriC, our distributed-memory implementation of triangle counting in graphs using the Message Passing Interface (MPI), as a submission to the 2020 Graph Challenge competition. Using a set of synthetic and real-world inputs from the challenge, we demonstrate a speedup of up to 90 x relative to previous work on 32 processor-cores of a NERSC Cori node. We also provide details from distributed runs with up to 8192 processes along with strong scaling results. The observations presented in this work provide an understanding of the system-level bottlenecks at scale that specifically impact sparse-irregular workloads and will therefore benefit other efforts to parallelize graph algorithms.","PeriodicalId":168544,"journal":{"name":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"TriC: Distributed-memory Triangle Counting by Exploiting the Graph Structure\",\"authors\":\"Sayan Ghosh, M. Halappanavar\",\"doi\":\"10.1109/HPEC43674.2020.9286167\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph analytics has emerged as an important tool in the analysis of large scale data from diverse application domains such as social networks, cyber security and bioinformatics. Counting the number of triangles in a graph is a fundamental kernel with several applications such as detecting the community structure of a graph or in identifying important vertices in a graph. The ubiquity of massive datasets is driving the need to scale graph analytics on parallel systems. However, numerous challenges exist in efficiently parallelizing graph algorithms, especially on distributed-memory systems. Irregular memory accesses and communication patterns, low computation to communication ratios, and the need for frequent synchronization are some of the leading challenges. In this paper, we present TriC, our distributed-memory implementation of triangle counting in graphs using the Message Passing Interface (MPI), as a submission to the 2020 Graph Challenge competition. Using a set of synthetic and real-world inputs from the challenge, we demonstrate a speedup of up to 90 x relative to previous work on 32 processor-cores of a NERSC Cori node. We also provide details from distributed runs with up to 8192 processes along with strong scaling results. The observations presented in this work provide an understanding of the system-level bottlenecks at scale that specifically impact sparse-irregular workloads and will therefore benefit other efforts to parallelize graph algorithms.\",\"PeriodicalId\":168544,\"journal\":{\"name\":\"2020 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC43674.2020.9286167\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC43674.2020.9286167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TriC: Distributed-memory Triangle Counting by Exploiting the Graph Structure
Graph analytics has emerged as an important tool in the analysis of large scale data from diverse application domains such as social networks, cyber security and bioinformatics. Counting the number of triangles in a graph is a fundamental kernel with several applications such as detecting the community structure of a graph or in identifying important vertices in a graph. The ubiquity of massive datasets is driving the need to scale graph analytics on parallel systems. However, numerous challenges exist in efficiently parallelizing graph algorithms, especially on distributed-memory systems. Irregular memory accesses and communication patterns, low computation to communication ratios, and the need for frequent synchronization are some of the leading challenges. In this paper, we present TriC, our distributed-memory implementation of triangle counting in graphs using the Message Passing Interface (MPI), as a submission to the 2020 Graph Challenge competition. Using a set of synthetic and real-world inputs from the challenge, we demonstrate a speedup of up to 90 x relative to previous work on 32 processor-cores of a NERSC Cori node. We also provide details from distributed runs with up to 8192 processes along with strong scaling results. The observations presented in this work provide an understanding of the system-level bottlenecks at scale that specifically impact sparse-irregular workloads and will therefore benefit other efforts to parallelize graph algorithms.