Siddhartha Raman Sundara Raman, Lizy John, Jaydeep P. Kulkarni
{"title":"NEM-GNN - DAC/ADC-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks","authors":"Siddhartha Raman Sundara Raman, Lizy John, Jaydeep P. Kulkarni","doi":"10.1145/3652607","DOIUrl":null,"url":null,"abstract":"<p>Graph neural networks (GNN) are of great interest in real-life applications such as citation networks, drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a graph into pre-defined categories. The first step uses a combination kernel to perform data-intensive convolution operations with regular memory access patterns. The second step uses an aggregation kernel that operates on sparse data having irregular access patterns. These mixed data patterns render CPU/GPU based compute energy-inefficient. Von-Neumann-based accelerators like AWB-GCN [7] suffer from increased data movement, as the data-intensive combination requires large data movement to/from memory to perform computations. ReFLIP [8] performs Resistive Random Access memory-based in-memory (PIM) compute to overcome data movement costs. However, ReFLIP suffers from increased area requirement due to dedicated accelerator arrangement, reduced performance due to limited parallelism and energy due to fundamental issues in ReRAM-based compute. This paper presents a scalable (non-exponential storage requirement), DAC/ADC-less PIM-based combination, with (i) early compute termination, (ii) pre-compute by reconfiguring SOC components. Graph and sparsity-aware near-memory aggregation using the proposed compute-as-soon-as-ready (CAR), broadcast approach improves performance and energy further. NEM-GNN achieves ∼ 80-230x, ∼ 80-300x, ∼ 850-1134x, and ∼ 7-8x improvement over ReFLIP, in terms of performance, throughput, energy efficiency and compute density.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"23 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3652607","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Graph neural networks (GNN) are of great interest in real-life applications such as citation networks, drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a graph into pre-defined categories. The first step uses a combination kernel to perform data-intensive convolution operations with regular memory access patterns. The second step uses an aggregation kernel that operates on sparse data having irregular access patterns. These mixed data patterns render CPU/GPU based compute energy-inefficient. Von-Neumann-based accelerators like AWB-GCN [7] suffer from increased data movement, as the data-intensive combination requires large data movement to/from memory to perform computations. ReFLIP [8] performs Resistive Random Access memory-based in-memory (PIM) compute to overcome data movement costs. However, ReFLIP suffers from increased area requirement due to dedicated accelerator arrangement, reduced performance due to limited parallelism and energy due to fundamental issues in ReRAM-based compute. This paper presents a scalable (non-exponential storage requirement), DAC/ADC-less PIM-based combination, with (i) early compute termination, (ii) pre-compute by reconfiguring SOC components. Graph and sparsity-aware near-memory aggregation using the proposed compute-as-soon-as-ready (CAR), broadcast approach improves performance and energy further. NEM-GNN achieves ∼ 80-230x, ∼ 80-300x, ∼ 850-1134x, and ∼ 7-8x improvement over ReFLIP, in terms of performance, throughput, energy efficiency and compute density.
期刊介绍:
ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.