{"title":"缩放图神经网络的联合划分和抽样算法","authors":"Manohar Lal Das, Vishwesh Jatala, Gagan Raj Gupta","doi":"10.1109/HiPC56025.2022.00018","DOIUrl":null,"url":null,"abstract":"Graph Neural Network (GNN) has emerged as a popular toolbox for solving complex problems on graph data structures. Graph neural networks use machine learning techniques to learn the vector representations of nodes and/or edges. Learning these representations demands a huge amount of memory and computing power. The traditional shared-memory multiprocessors are insufficient to meet real-world data’s computing requirements; hence, research has gained momentum toward distributed GNN.Scaling the distributed GNN has the following challenges: (1) the input graph needs to be efficiently partitioned, (2) the cost of communication between compute nodes should be reduced, and (3) the sampling strategy should be efficiently chosen to minimize the loss in accuracy. To address these challenges, we propose a joint partitioning and sampling algorithm, which partitions the input graph with weighted METIS and uses a bias sampling strategy to minimize total communication costs.We implemented our approach using the DistDGL framework and evaluated it using several real-world datasets. We observe that our approach (1) shows an average reduction in communication overhead by 53%, (2) requires less partitioning time to partition a graph, (3) shows improved accuracy, (4) shows a speed up of 1.5x on OGB-Arxiv dataset, when compared to the state-of-the-art DistDGL implementation.","PeriodicalId":119363,"journal":{"name":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"275 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Joint Partitioning and Sampling Algorithm for Scaling Graph Neural Network\",\"authors\":\"Manohar Lal Das, Vishwesh Jatala, Gagan Raj Gupta\",\"doi\":\"10.1109/HiPC56025.2022.00018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Neural Network (GNN) has emerged as a popular toolbox for solving complex problems on graph data structures. Graph neural networks use machine learning techniques to learn the vector representations of nodes and/or edges. Learning these representations demands a huge amount of memory and computing power. The traditional shared-memory multiprocessors are insufficient to meet real-world data’s computing requirements; hence, research has gained momentum toward distributed GNN.Scaling the distributed GNN has the following challenges: (1) the input graph needs to be efficiently partitioned, (2) the cost of communication between compute nodes should be reduced, and (3) the sampling strategy should be efficiently chosen to minimize the loss in accuracy. To address these challenges, we propose a joint partitioning and sampling algorithm, which partitions the input graph with weighted METIS and uses a bias sampling strategy to minimize total communication costs.We implemented our approach using the DistDGL framework and evaluated it using several real-world datasets. We observe that our approach (1) shows an average reduction in communication overhead by 53%, (2) requires less partitioning time to partition a graph, (3) shows improved accuracy, (4) shows a speed up of 1.5x on OGB-Arxiv dataset, when compared to the state-of-the-art DistDGL implementation.\",\"PeriodicalId\":119363,\"journal\":{\"name\":\"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"volume\":\"275 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC56025.2022.00018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC56025.2022.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Joint Partitioning and Sampling Algorithm for Scaling Graph Neural Network
Graph Neural Network (GNN) has emerged as a popular toolbox for solving complex problems on graph data structures. Graph neural networks use machine learning techniques to learn the vector representations of nodes and/or edges. Learning these representations demands a huge amount of memory and computing power. The traditional shared-memory multiprocessors are insufficient to meet real-world data’s computing requirements; hence, research has gained momentum toward distributed GNN.Scaling the distributed GNN has the following challenges: (1) the input graph needs to be efficiently partitioned, (2) the cost of communication between compute nodes should be reduced, and (3) the sampling strategy should be efficiently chosen to minimize the loss in accuracy. To address these challenges, we propose a joint partitioning and sampling algorithm, which partitions the input graph with weighted METIS and uses a bias sampling strategy to minimize total communication costs.We implemented our approach using the DistDGL framework and evaluated it using several real-world datasets. We observe that our approach (1) shows an average reduction in communication overhead by 53%, (2) requires less partitioning time to partition a graph, (3) shows improved accuracy, (4) shows a speed up of 1.5x on OGB-Arxiv dataset, when compared to the state-of-the-art DistDGL implementation.