{"title":"Accelerating GNN Inference by Soft Channel Pruning","authors":"Wenbo Zhang, Jingwei Sun, Guangzhong Sun","doi":"10.1109/PAAP56126.2022.10010603","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) are effective models for processing graph-structured data. With the continuous growth of graph data scale and the deepening of graph neural network layers, the heavy cost of GNN inference has greatly limited its application in real-time tasks. This paper focus on accelerating the performance of GNN inference. We first measures the execution time ratio of each stage in the inference process for commonly used GNN models, and analyzes the performance characteristics of different stages. We find out that the critical performance factor of GNN inference is the feature dimension, which is different to CNN and NLP models. Therefore, we propose a soft channel pruning method with a ladder pruning pattern. It reduces the calculation from unimportant graph node features and achieve performance acceleration. Meanwhile, it reserves inference accuracy of GNNs. According to experimental validation on graph datasets of different scales, our method can effectively reduce the inference latency and achieve 2×–7× speedup. Also, compared with existing pruning methods, higher inference accuracy can be obtained with comparable speedup ratio.","PeriodicalId":336339,"journal":{"name":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PAAP56126.2022.10010603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Graph Neural Networks (GNNs) are effective models for processing graph-structured data. With the continuous growth of graph data scale and the deepening of graph neural network layers, the heavy cost of GNN inference has greatly limited its application in real-time tasks. This paper focus on accelerating the performance of GNN inference. We first measures the execution time ratio of each stage in the inference process for commonly used GNN models, and analyzes the performance characteristics of different stages. We find out that the critical performance factor of GNN inference is the feature dimension, which is different to CNN and NLP models. Therefore, we propose a soft channel pruning method with a ladder pruning pattern. It reduces the calculation from unimportant graph node features and achieve performance acceleration. Meanwhile, it reserves inference accuracy of GNNs. According to experimental validation on graph datasets of different scales, our method can effectively reduce the inference latency and achieve 2×–7× speedup. Also, compared with existing pruning methods, higher inference accuracy can be obtained with comparable speedup ratio.