B. Goodarzi, Farzad Khorasani, Vivek Sarkar, D. Goswami
{"title":"High Performance Multilevel Graph Partitioning on GPU","authors":"B. Goodarzi, Farzad Khorasani, Vivek Sarkar, D. Goswami","doi":"10.1109/HPCS48598.2019.9188120","DOIUrl":null,"url":null,"abstract":"Graph partitioning is a common computational phase in many application domains, including social network analysis, data mining, scheduling, and VLSI design. The significant SIMT compute power of a GPU makes it an appropriate platform to exploit data parallelism in graph partitioning and accelerate the computation. However, irregular, non-uniform, and data-dependent graph partitioning sub-tasks pose multiple challenges for efficient GPU utilization. Some of these challenges include load imbalance, non-coalesced memory accesses, and warp execution inefficiency. In this paper, we describe an effective and methodological approach to enable multi-level graph partitioning on GPUs. Our solution avoids thread divergence and balances the load over GPU threads by dynamically assigning appropriate number of threads to process the graph vertices and their irregular sized neighbors. Our design is autonomous, i.e., all the steps are carried out by the GPU with minimal CPU involvement, which is required for a range of GPU applications as a pre-processing step. We show that our approach performs better and is comparable in partitioning quality with respect to the state-of-the-art CPU-based parallel graph partitioner (mtmetis). Moreover, to the best of our knowledge, it is the first autonomous approach on GPU.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS48598.2019.9188120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Graph partitioning is a common computational phase in many application domains, including social network analysis, data mining, scheduling, and VLSI design. The significant SIMT compute power of a GPU makes it an appropriate platform to exploit data parallelism in graph partitioning and accelerate the computation. However, irregular, non-uniform, and data-dependent graph partitioning sub-tasks pose multiple challenges for efficient GPU utilization. Some of these challenges include load imbalance, non-coalesced memory accesses, and warp execution inefficiency. In this paper, we describe an effective and methodological approach to enable multi-level graph partitioning on GPUs. Our solution avoids thread divergence and balances the load over GPU threads by dynamically assigning appropriate number of threads to process the graph vertices and their irregular sized neighbors. Our design is autonomous, i.e., all the steps are carried out by the GPU with minimal CPU involvement, which is required for a range of GPU applications as a pre-processing step. We show that our approach performs better and is comparable in partitioning quality with respect to the state-of-the-art CPU-based parallel graph partitioner (mtmetis). Moreover, to the best of our knowledge, it is the first autonomous approach on GPU.