NEM-GNN - 用于图神经网络的无 DAC/ADC、可扩展、可重构、图形和稀疏感知的近内存加速器

IF 1.5 3区 计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Siddhartha Raman Sundara Raman, Lizy John, Jaydeep P. Kulkarni
{"title":"NEM-GNN - 用于图神经网络的无 DAC/ADC、可扩展、可重构、图形和稀疏感知的近内存加速器","authors":"Siddhartha Raman Sundara Raman, Lizy John, Jaydeep P. Kulkarni","doi":"10.1145/3652607","DOIUrl":null,"url":null,"abstract":"<p>Graph neural networks (GNN) are of great interest in real-life applications such as citation networks, drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a graph into pre-defined categories. The first step uses a combination kernel to perform data-intensive convolution operations with regular memory access patterns. The second step uses an aggregation kernel that operates on sparse data having irregular access patterns. These mixed data patterns render CPU/GPU based compute energy-inefficient. Von-Neumann-based accelerators like AWB-GCN [7] suffer from increased data movement, as the data-intensive combination requires large data movement to/from memory to perform computations. ReFLIP [8] performs Resistive Random Access memory-based in-memory (PIM) compute to overcome data movement costs. However, ReFLIP suffers from increased area requirement due to dedicated accelerator arrangement, reduced performance due to limited parallelism and energy due to fundamental issues in ReRAM-based compute. This paper presents a scalable (non-exponential storage requirement), DAC/ADC-less PIM-based combination, with (i) early compute termination, (ii) pre-compute by reconfiguring SOC components. Graph and sparsity-aware near-memory aggregation using the proposed compute-as-soon-as-ready (CAR), broadcast approach improves performance and energy further. NEM-GNN achieves ∼ 80-230x, ∼ 80-300x, ∼ 850-1134x, and ∼ 7-8x improvement over ReFLIP, in terms of performance, throughput, energy efficiency and compute density.</p>","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"23 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NEM-GNN - DAC/ADC-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks\",\"authors\":\"Siddhartha Raman Sundara Raman, Lizy John, Jaydeep P. Kulkarni\",\"doi\":\"10.1145/3652607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Graph neural networks (GNN) are of great interest in real-life applications such as citation networks, drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a graph into pre-defined categories. The first step uses a combination kernel to perform data-intensive convolution operations with regular memory access patterns. The second step uses an aggregation kernel that operates on sparse data having irregular access patterns. These mixed data patterns render CPU/GPU based compute energy-inefficient. Von-Neumann-based accelerators like AWB-GCN [7] suffer from increased data movement, as the data-intensive combination requires large data movement to/from memory to perform computations. ReFLIP [8] performs Resistive Random Access memory-based in-memory (PIM) compute to overcome data movement costs. However, ReFLIP suffers from increased area requirement due to dedicated accelerator arrangement, reduced performance due to limited parallelism and energy due to fundamental issues in ReRAM-based compute. This paper presents a scalable (non-exponential storage requirement), DAC/ADC-less PIM-based combination, with (i) early compute termination, (ii) pre-compute by reconfiguring SOC components. Graph and sparsity-aware near-memory aggregation using the proposed compute-as-soon-as-ready (CAR), broadcast approach improves performance and energy further. NEM-GNN achieves ∼ 80-230x, ∼ 80-300x, ∼ 850-1134x, and ∼ 7-8x improvement over ReFLIP, in terms of performance, throughput, energy efficiency and compute density.</p>\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3652607\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3652607","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

图神经网络(GNN)能够在图上应用机器学习技术,因此在引文网络、药物发现等现实应用中备受关注。GNN 采用两步法将图中的节点划分为预定义的类别。第一步使用组合内核,以常规内存访问模式执行数据密集型卷积操作。第二步使用聚合内核,对具有不规则访问模式的稀疏数据进行操作。这些混合数据模式导致基于 CPU/GPU 的计算能效低下。基于冯-诺伊曼的加速器(如 AWB-GCN [7])会增加数据移动,因为数据密集型组合需要大量数据进出内存才能执行计算。ReFLIP [8] 执行基于电阻式随机存取存储器的内存(PIM)计算,以克服数据移动成本。然而,ReFLIP 存在以下问题:由于专用加速器的布置,面积要求增加;由于并行性有限,性能降低;由于基于 ReRAM 计算的基本问题,能耗增加。本文提出了一种可扩展(非指数存储要求)、无 DAC/ADC 的基于 PIM 的组合,(i) 提前终止计算,(ii) 通过重新配置 SOC 组件进行预计算。使用建议的 "计算即就绪"(CAR)广播方法进行图形和稀疏感知的近内存聚合,进一步提高了性能和能耗。与 ReFLIP 相比,NEM-GNN 在性能、吞吐量、能效和计算密度方面分别提高了 80-230 倍、80-300 倍、850-1134 倍和 7-8 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NEM-GNN - DAC/ADC-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks

Graph neural networks (GNN) are of great interest in real-life applications such as citation networks, drug discovery owing to GNN’s ability to apply machine learning techniques on graphs. GNNs utilize a two-step approach to classify the nodes in a graph into pre-defined categories. The first step uses a combination kernel to perform data-intensive convolution operations with regular memory access patterns. The second step uses an aggregation kernel that operates on sparse data having irregular access patterns. These mixed data patterns render CPU/GPU based compute energy-inefficient. Von-Neumann-based accelerators like AWB-GCN [7] suffer from increased data movement, as the data-intensive combination requires large data movement to/from memory to perform computations. ReFLIP [8] performs Resistive Random Access memory-based in-memory (PIM) compute to overcome data movement costs. However, ReFLIP suffers from increased area requirement due to dedicated accelerator arrangement, reduced performance due to limited parallelism and energy due to fundamental issues in ReRAM-based compute. This paper presents a scalable (non-exponential storage requirement), DAC/ADC-less PIM-based combination, with (i) early compute termination, (ii) pre-compute by reconfiguring SOC components. Graph and sparsity-aware near-memory aggregation using the proposed compute-as-soon-as-ready (CAR), broadcast approach improves performance and energy further. NEM-GNN achieves ∼ 80-230x, ∼ 80-300x, ∼ 850-1134x, and ∼ 7-8x improvement over ReFLIP, in terms of performance, throughput, energy efficiency and compute density.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization 工程技术-计算机:理论方法
CiteScore
3.60
自引率
6.20%
发文量
78
审稿时长
6-12 weeks
期刊介绍: ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信