Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, Tiejun Li
{"title":"SparGD: A Sparse GEMM Accelerator with Dynamic Dataflow","authors":"Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, Tiejun Li","doi":"10.1145/3634703","DOIUrl":null,"url":null,"abstract":"<p>Deep learning has become a highly popular research field, and previously deep learning algorithms ran primarily on CPUs and GPUs. However, with the rapid development of deep learning, it was discovered that existing processors could not meet the specific large-scale computing requirements of deep learning, and custom deep learning accelerators have become popular. The majority of the primary workloads in deep learning are general matrix-matrix multiplications (GEMM), and emerging GEMMs are highly sparse and irregular. The TPU and SIGMA are typical GEMM accelerators in recent years, but the TPU does not support sparsity, and both the TPU and SIGMA have insufficient utilization rates of the Processing Element (PE). We design and implement the SparGD, a sparse GEMM accelerator with dynamic dataflow. The SparGD has specific PE structures, flexible distribution networks and reduction networks, and a simple dataflow switching module. When running sparse and irregular GEMMs, the SparGD can maintain high PE utilization while utilizing sparsity, and can switch to the optimal dataflow according to the computing environment. For sparse, irregular GEMMs, our experimental results show that the SparGD outperforms systolic arrays by 30 times and SIGMA by 3.6 times.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2023-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3634703","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning has become a highly popular research field, and previously deep learning algorithms ran primarily on CPUs and GPUs. However, with the rapid development of deep learning, it was discovered that existing processors could not meet the specific large-scale computing requirements of deep learning, and custom deep learning accelerators have become popular. The majority of the primary workloads in deep learning are general matrix-matrix multiplications (GEMM), and emerging GEMMs are highly sparse and irregular. The TPU and SIGMA are typical GEMM accelerators in recent years, but the TPU does not support sparsity, and both the TPU and SIGMA have insufficient utilization rates of the Processing Element (PE). We design and implement the SparGD, a sparse GEMM accelerator with dynamic dataflow. The SparGD has specific PE structures, flexible distribution networks and reduction networks, and a simple dataflow switching module. When running sparse and irregular GEMMs, the SparGD can maintain high PE utilization while utilizing sparsity, and can switch to the optimal dataflow according to the computing environment. For sparse, irregular GEMMs, our experimental results show that the SparGD outperforms systolic arrays by 30 times and SIGMA by 3.6 times.
期刊介绍:
TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.