Yiqian Liu, Noushin Azami, Avery VanAusdal, Martin Burtscher
{"title":"Indigo3: A Parallel Graph Analytics Benchmark Suite for Exploring Implementation Styles and Common Bugs","authors":"Yiqian Liu, Noushin Azami, Avery VanAusdal, Martin Burtscher","doi":"10.1145/3665251","DOIUrl":"https://doi.org/10.1145/3665251","url":null,"abstract":"Graph analytics codes are widely used and tend to exhibit input-dependent behavior, making them particularly interesting for software verification and validation. This paper presents Indigo3, a labeled benchmark suite based on 7 graph algorithms that are implemented in different styles, including versions with deliberately planted bugs. We systematically combine 13 sets of implementation styles and 15 common bug types to create the 41,790 CUDA, OpenMP, and parallel C programs in the suite. Each code is labeled with the styles and bugs it incorporates. We used 4 subsets of Indigo3 to test 5 program-verification tools. Our results show that the tools perform quite differently across the bug types and implementation styles, have distinct strengths and weaknesses, and generally struggle with graph codes. We discuss the styles and bugs that tend to be the most challenging as well as the programming patterns that yield false positives.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140973846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiang Fu, Yuede Ji, Thomas B. Rolinger, H. H. Huang
{"title":"TLPGNN\u0000 : A Lightweight Two-Level Parallelism Paradigm for Graph Neural Network Computation on Single and Multiple GPUs","authors":"Qiang Fu, Yuede Ji, Thomas B. Rolinger, H. H. Huang","doi":"10.1145/3644712","DOIUrl":"https://doi.org/10.1145/3644712","url":null,"abstract":"\u0000 Graph Neural Networks (GNNs) are an emerging class of deep learning models specifically designed for graph-structured data. They have been effectively employed in a variety of real-world applications, including recommendation systems, drug development, and analysis of social networks. The GNN computation includes regular neural network operations and general graph convolution operations, which take most of the total computation time. Though several recent works have been proposed to accelerate the computation for GNNs, they face the limitations of heavy pre-processing, low efficiency atomic operations, and unnecessary kernel launches. In this paper, we design\u0000 TLPGNN\u0000 , a lightweight two-level parallelism paradigm for GNN computation. First, we conduct a systematic analysis of the hardware resource usage of GNN workloads to understand the characteristics of GNN workloads deeply. With the insightful observations, we then divide the GNN computation into two levels, i.e.,\u0000 vertex parallelism\u0000 for the first level and\u0000 feature parallelism\u0000 for the second. Next, we employ a novel hybrid dynamic workload assignment to address the imbalanced workload distribution. Furthermore, we fuse the kernels to reduce the number of kernel launches and cache the frequently accessed data into registers to avoid unnecessary memory traffic. To scale\u0000 TLPGNN\u0000 to multi-GPU environments, we propose an edge-aware row-wise 1-D partition method to ensure a balanced workload distribution across different GPU devices. Experimental results on various benchmark datasets demonstrate the superiority of our approach, achieving substantial performance improvement over state-of-the-art GNN computation systems, including Deep Graph Library (DGL), GNNAdvisor, and FeatGraph, with speedups of 6.1 ×, 7.7 ×, and 3.0 ×, respectively, on average. Evaluations of multiple-GPU\u0000 TLPGNN\u0000 also demonstrate that our solution achieves both linear scalability and a well-balanced workload distribution.\u0000","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139848851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}