A Data Prefetcher-Based 1000-Core RISC-V Processor for Efficient Processing of Graph Neural Networks

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters Pub Date : 2025-02-26 DOI:10.1109/LCA.2025.3545799

Omer Khan

引用次数: 0

Abstract

Graphs-based neural networks have seen tremendous adoption to perform complex predictive analytics on massive real-world graphs. The trend in hardware acceleration has identified significant challenges with harnessing graph locality and workload imbalance due to ultra-sparse and irregular matrix computations at a massively parallel scale. State-of-the-art hardware accelerators utilize massive multithreading and asynchronous execution in GPUs to achieve parallel performance at high power consumption. This paper aims to bridge the power-performance gap using the energy efficiency-centric RISC-V ecosystem. A 1000-core RISC-V processor is proposed to unlock massive parallelism in the graphs-based matrix operators to achieve a low-latency data access paradigm in hardware to achieve robust power-performance scaling. Each core implements a single-threaded pipeline with a novel graph-aware data prefetcher at the 1000 cores scale to deliver an average 20× performance per watt advantage over state-of-the-art NVIDIA GPU.

查看原文本刊更多论文

基于数据预取器的1000核RISC-V处理器高效处理图神经网络

基于图的神经网络已经被广泛应用于对大量现实世界的图执行复杂的预测分析。硬件加速的趋势已经确定了在大规模并行规模下利用图局部性和由于超稀疏和不规则矩阵计算而导致的工作负载不平衡所带来的重大挑战。最先进的硬件加速器利用gpu中的大规模多线程和异步执行来实现高功耗下的并行性能。本文旨在利用以能效为中心的RISC-V生态系统弥合功率性能差距。提出了一种1000核RISC-V处理器来解锁基于图的矩阵运算中的大规模并行性，以实现硬件中的低延迟数据访问范式，从而实现鲁棒的功率性能扩展。每个核心实现了一个单线程管道，具有新颖的图形感知数据预取器，在1000核规模上提供比最先进的NVIDIA GPU平均每瓦20倍的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.