Yunzhen Luo;Yan Ding;Zhuo Tang;Keqin Li;Kenli Li;Chubo Liu
{"title":"面向图神经网络的统一位稀疏感知加速器BEAST-GNN","authors":"Yunzhen Luo;Yan Ding;Zhuo Tang;Keqin Li;Kenli Li;Chubo Liu","doi":"10.1109/TC.2025.3558587","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) excel in processing graph-structured data, making them attractive and promising for tasks such as recommender systems and traffic forecasting. However, GNNs’ irregular computational patterns limit their ability to achieve low latency and high energy efficiency, particularly in edge computing environments. Current GNN accelerators predominantly focus on value sparsity, underutilizing the potential performance gains from bit-level sparsity. However, applying existing bit-serial accelerators to GNNs presents several challenges. These challenges arise from GNNs’ more complex data flow compared to conventional neural networks, as well as difficulties in data localization and load balancing with irregular graph data. To address these challenges, we propose BEAST-GNN, a bit-serial GNN accelerator that fully exploits bit-level sparsity. BEAST-GNN introduces streamlined sparse-dense bit matrix multiplication for optimized data flow, a column-overlapped graph partitioning method to enhance data locality by reducing memory access inefficiencies, and a sparse bit-counting strategy to ensure balanced workload distribution across processing elements (PEs). Compared to state-of-the-art accelerators, including HyGCN, GCNAX, Laconic, GROW, I-GCN, SGCN, and MEGA, BEAST-GNN achieves speedups of 21.7<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 6.4<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 10.5<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 3.7<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 4.0<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 3.3<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, and 1.4<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> respectively, while also reducing DRAM access by 36.3<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 7.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 6.6<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 3.9<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 5.38<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, 3.37<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>, and 1.44<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula>. Additionally, BEAST-GNN consumes only 4.8%, 12.4%, 19.6%, 27.7%, 17.0%, 26.5%, and 82.8% of the energy required by these architectures.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 7","pages":"2402-2416"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BEAST-GNN: A United Bit Sparsity-Aware Accelerator for Graph Neural Networks\",\"authors\":\"Yunzhen Luo;Yan Ding;Zhuo Tang;Keqin Li;Kenli Li;Chubo Liu\",\"doi\":\"10.1109/TC.2025.3558587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph Neural Networks (GNNs) excel in processing graph-structured data, making them attractive and promising for tasks such as recommender systems and traffic forecasting. However, GNNs’ irregular computational patterns limit their ability to achieve low latency and high energy efficiency, particularly in edge computing environments. Current GNN accelerators predominantly focus on value sparsity, underutilizing the potential performance gains from bit-level sparsity. However, applying existing bit-serial accelerators to GNNs presents several challenges. These challenges arise from GNNs’ more complex data flow compared to conventional neural networks, as well as difficulties in data localization and load balancing with irregular graph data. To address these challenges, we propose BEAST-GNN, a bit-serial GNN accelerator that fully exploits bit-level sparsity. BEAST-GNN introduces streamlined sparse-dense bit matrix multiplication for optimized data flow, a column-overlapped graph partitioning method to enhance data locality by reducing memory access inefficiencies, and a sparse bit-counting strategy to ensure balanced workload distribution across processing elements (PEs). Compared to state-of-the-art accelerators, including HyGCN, GCNAX, Laconic, GROW, I-GCN, SGCN, and MEGA, BEAST-GNN achieves speedups of 21.7<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 6.4<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 10.5<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 3.7<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 4.0<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 3.3<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, and 1.4<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula> respectively, while also reducing DRAM access by 36.3<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 7.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 6.6<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 3.9<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 5.38<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, 3.37<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>, and 1.44<inline-formula><tex-math>$\\\\boldsymbol{\\\\times}$</tex-math></inline-formula>. Additionally, BEAST-GNN consumes only 4.8%, 12.4%, 19.6%, 27.7%, 17.0%, 26.5%, and 82.8% of the energy required by these architectures.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 7\",\"pages\":\"2402-2416\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10955485/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10955485/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
BEAST-GNN: A United Bit Sparsity-Aware Accelerator for Graph Neural Networks
Graph Neural Networks (GNNs) excel in processing graph-structured data, making them attractive and promising for tasks such as recommender systems and traffic forecasting. However, GNNs’ irregular computational patterns limit their ability to achieve low latency and high energy efficiency, particularly in edge computing environments. Current GNN accelerators predominantly focus on value sparsity, underutilizing the potential performance gains from bit-level sparsity. However, applying existing bit-serial accelerators to GNNs presents several challenges. These challenges arise from GNNs’ more complex data flow compared to conventional neural networks, as well as difficulties in data localization and load balancing with irregular graph data. To address these challenges, we propose BEAST-GNN, a bit-serial GNN accelerator that fully exploits bit-level sparsity. BEAST-GNN introduces streamlined sparse-dense bit matrix multiplication for optimized data flow, a column-overlapped graph partitioning method to enhance data locality by reducing memory access inefficiencies, and a sparse bit-counting strategy to ensure balanced workload distribution across processing elements (PEs). Compared to state-of-the-art accelerators, including HyGCN, GCNAX, Laconic, GROW, I-GCN, SGCN, and MEGA, BEAST-GNN achieves speedups of 21.7$\boldsymbol{\times}$, 6.4$\boldsymbol{\times}$, 10.5$\boldsymbol{\times}$, 3.7$\boldsymbol{\times}$, 4.0$\boldsymbol{\times}$, 3.3$\boldsymbol{\times}$, and 1.4$\boldsymbol{\times}$ respectively, while also reducing DRAM access by 36.3$\boldsymbol{\times}$, 7.9$\boldsymbol{\times}$, 6.6$\boldsymbol{\times}$, 3.9$\boldsymbol{\times}$, 5.38$\boldsymbol{\times}$, 3.37$\boldsymbol{\times}$, and 1.44$\boldsymbol{\times}$. Additionally, BEAST-GNN consumes only 4.8%, 12.4%, 19.6%, 27.7%, 17.0%, 26.5%, and 82.8% of the energy required by these architectures.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.