{"title":"Vesper: A Versatile Sparse Linear Algebra Accelerator With Configurable Compute Patterns","authors":"Hanchen Jin;Zichao Yue;Zhongyuan Zhao;Yixiao Du;Chenhui Deng;Nitish Srivastava;Zhiru Zhang","doi":"10.1109/TCAD.2024.3496882","DOIUrl":null,"url":null,"abstract":"Sparse linear algebra (SLA) operations are fundamental building blocks for many important applications, such as data analytics, graph processing, machine learning, and scientific computing. In particular, four compute kernels in SLA are widely used, including sparse-matrix-dense-vector multiplication, sparse-matrix-dense-matrix multiplication, sparse-matrix-sparse-vector multiplication, and sparse-matrix-sparse-matrix multiplication. Recently, an active area of research has emerged to build specialized hardware accelerators for these SLA kernels. However, existing efforts mostly focus on accelerating a single kernel and the proposed accelerator architectures are often limited to a specific compute pattern, such as inner, outer, or row-wise product. This work proposes Vesper, a high-performance and versatile sparse accelerator that supports all four important SLA kernels while being configurable to execute the compute patterns suitable for different kernels under various degrees of sparsity. To enable rapid exploration of the large architectural design and configuration space, we devise an analytical model to estimate the performance of an SLA kernel running on a given hardware configuration using a specific compute pattern. Guided by our model, we build a flexible yet efficient accelerator architecture that maximizes the resource sharing amongst the hardware modules used for different SLA kernels and the associated compute patterns. We evaluate the performance of Vesper using gem5 on a diverse set of matrices from SuiteSparse. Our experiment results show that Vesper achieves a comparable or higher throughput with increased bandwidth efficiency than the state-of-the-art accelerators that are tailor-made for a specific SLA kernel. In addition, we evaluate Vesper on a real-world application called label propagation (LP), an iterative graph-based learning algorithm that involves multiple SLA kernels and exhibits varying degrees of sparsity across iterations. Compared to CPU- and GPU-based executions, Vesper speeds up the LP algorithm by <inline-formula> <tex-math>$12.0\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$1.7\\times $ </tex-math></inline-formula>, respectively.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1731-1744"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10752521/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Sparse linear algebra (SLA) operations are fundamental building blocks for many important applications, such as data analytics, graph processing, machine learning, and scientific computing. In particular, four compute kernels in SLA are widely used, including sparse-matrix-dense-vector multiplication, sparse-matrix-dense-matrix multiplication, sparse-matrix-sparse-vector multiplication, and sparse-matrix-sparse-matrix multiplication. Recently, an active area of research has emerged to build specialized hardware accelerators for these SLA kernels. However, existing efforts mostly focus on accelerating a single kernel and the proposed accelerator architectures are often limited to a specific compute pattern, such as inner, outer, or row-wise product. This work proposes Vesper, a high-performance and versatile sparse accelerator that supports all four important SLA kernels while being configurable to execute the compute patterns suitable for different kernels under various degrees of sparsity. To enable rapid exploration of the large architectural design and configuration space, we devise an analytical model to estimate the performance of an SLA kernel running on a given hardware configuration using a specific compute pattern. Guided by our model, we build a flexible yet efficient accelerator architecture that maximizes the resource sharing amongst the hardware modules used for different SLA kernels and the associated compute patterns. We evaluate the performance of Vesper using gem5 on a diverse set of matrices from SuiteSparse. Our experiment results show that Vesper achieves a comparable or higher throughput with increased bandwidth efficiency than the state-of-the-art accelerators that are tailor-made for a specific SLA kernel. In addition, we evaluate Vesper on a real-world application called label propagation (LP), an iterative graph-based learning algorithm that involves multiple SLA kernels and exhibits varying degrees of sparsity across iterations. Compared to CPU- and GPU-based executions, Vesper speeds up the LP algorithm by $12.0\times $ and $1.7\times $ , respectively.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.