Shiyan Yi;Yudi Qiu;Guohao Xu;Lingfei Lu;Xiaoyang Zeng;Yibo Fan
{"title":"GATe: Efficient Graph Attention Network Acceleration With Near-Memory Processing","authors":"Shiyan Yi;Yudi Qiu;Guohao Xu;Lingfei Lu;Xiaoyang Zeng;Yibo Fan","doi":"10.1109/TC.2025.3588317","DOIUrl":null,"url":null,"abstract":"Graph Attention Network (GAT) has gained widespread adoption thanks to its exceptional performance in processing non-Euclidean graphs. The critical components of a GAT model involve aggregation and attention, which cause numerous main-memory access, occupying significant inference time. Recently, much research has proposed near-memory processing (NMP) architectures to accelerate aggregation. However, graph attention requires additional operations distinct from aggregation, making previous NMP architectures less suitable for supporting GAT, as they typically target aggregation-only workloads. In this paper, we propose GATe, a practical and efficient <u>GAT</u> acc<u>e</u>lerator with NMP architecture. To the best of our knowledge, this is the first time that accelerates both attention and aggregation computation on DIMM. We unify feature vector access to eliminate the two repetitive memory accesses to source nodes caused by the sequential phase-by-phase execution of attention and aggregation. Next, we refine the computation flow to reduce data dependencies in concatenation and softmax, which lowers on-chip memory usage and communication overhead. Additionally, we introduce a novel sharding method that enhances data reusability of high-degree nodes. Experiments show that GATe achieves substantial speedup of GAT attention and aggregation phases up to 6.77<inline-formula><tex-math>${\\boldsymbol\\times}$</tex-math></inline-formula> and 2.46<inline-formula><tex-math>${\\boldsymbol\\times}$</tex-math></inline-formula>, with average to 3.69<inline-formula><tex-math>${\\boldsymbol\\times}$</tex-math></inline-formula> and 2.24<inline-formula><tex-math>${\\boldsymbol\\times}$</tex-math></inline-formula>, respectively, compared to state-of-the-art NMP works GNNear and GraNDe.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3419-3432"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11078437/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Graph Attention Network (GAT) has gained widespread adoption thanks to its exceptional performance in processing non-Euclidean graphs. The critical components of a GAT model involve aggregation and attention, which cause numerous main-memory access, occupying significant inference time. Recently, much research has proposed near-memory processing (NMP) architectures to accelerate aggregation. However, graph attention requires additional operations distinct from aggregation, making previous NMP architectures less suitable for supporting GAT, as they typically target aggregation-only workloads. In this paper, we propose GATe, a practical and efficient GAT accelerator with NMP architecture. To the best of our knowledge, this is the first time that accelerates both attention and aggregation computation on DIMM. We unify feature vector access to eliminate the two repetitive memory accesses to source nodes caused by the sequential phase-by-phase execution of attention and aggregation. Next, we refine the computation flow to reduce data dependencies in concatenation and softmax, which lowers on-chip memory usage and communication overhead. Additionally, we introduce a novel sharding method that enhances data reusability of high-degree nodes. Experiments show that GATe achieves substantial speedup of GAT attention and aggregation phases up to 6.77${\boldsymbol\times}$ and 2.46${\boldsymbol\times}$, with average to 3.69${\boldsymbol\times}$ and 2.24${\boldsymbol\times}$, respectively, compared to state-of-the-art NMP works GNNear and GraNDe.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.