{"title":"IOPS: A Unified SpMM Accelerator Based on Inner-Outer-Hybrid Product","authors":"Wenhao Sun;Wendi Sun;Song Chen;Yi Kang","doi":"10.1109/TC.2025.3558013","DOIUrl":null,"url":null,"abstract":"Sparse matrix multiplication (SpMM) is widely applied to numerous domains, such as graph processing and machine learning. However, inner product (IP) induces redundant zero-element computing for mismatched nonzero operands, while outer product (OP) lacks input reuse across Process Elements (PEs). Besides, current accelerators only focus on sparse-sparse matrix multiplication (SSMM) or sparse-dense matrix multiplication (SDMM), rarely performing efficiently for both. To compensate for the shortcomings of IP and OP, we propose an inner-outer-hybrid product (IOHP) method, which reuses the input matrix among PEs with IP and removes zero-element calculations with OP in each PE. Based on IOHP, we co-design a accelerator with a unified computing flow, called IOPS, to efficiently process both SSMM and SDMM. It divides the SpMM into three stages: encoding, partial sum (psum) calculation, and address mapping, where the input matrices can be reused among PEs after encoding (IP) and the zero element can be skipped in the latter two stages (OP). Furthermore, an adaptive partition strategy is proposed to tile the input matrices based on their sparsity ratios, effectively utilizing the on-chip storage and reducing DRAM access. Compared with SpArch, we achieve <inline-formula><tex-math>$1.2\\boldsymbol{\\times}$</tex-math></inline-formula>~<inline-formula><tex-math>$4.3\\boldsymbol{\\times}$</tex-math></inline-formula> performance and <inline-formula><tex-math>$1.3\\boldsymbol{\\times}$</tex-math></inline-formula>~<inline-formula><tex-math>$4.8\\boldsymbol{\\times}$</tex-math></inline-formula> energy efficiency, with <inline-formula><tex-math>$1.4\\boldsymbol{\\times}$</tex-math></inline-formula>~<inline-formula><tex-math>$2.1\\boldsymbol{\\times}$</tex-math></inline-formula> DRAM access saving.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 7","pages":"2210-2222"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10949697/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Sparse matrix multiplication (SpMM) is widely applied to numerous domains, such as graph processing and machine learning. However, inner product (IP) induces redundant zero-element computing for mismatched nonzero operands, while outer product (OP) lacks input reuse across Process Elements (PEs). Besides, current accelerators only focus on sparse-sparse matrix multiplication (SSMM) or sparse-dense matrix multiplication (SDMM), rarely performing efficiently for both. To compensate for the shortcomings of IP and OP, we propose an inner-outer-hybrid product (IOHP) method, which reuses the input matrix among PEs with IP and removes zero-element calculations with OP in each PE. Based on IOHP, we co-design a accelerator with a unified computing flow, called IOPS, to efficiently process both SSMM and SDMM. It divides the SpMM into three stages: encoding, partial sum (psum) calculation, and address mapping, where the input matrices can be reused among PEs after encoding (IP) and the zero element can be skipped in the latter two stages (OP). Furthermore, an adaptive partition strategy is proposed to tile the input matrices based on their sparsity ratios, effectively utilizing the on-chip storage and reducing DRAM access. Compared with SpArch, we achieve $1.2\boldsymbol{\times}$~$4.3\boldsymbol{\times}$ performance and $1.3\boldsymbol{\times}$~$4.8\boldsymbol{\times}$ energy efficiency, with $1.4\boldsymbol{\times}$~$2.1\boldsymbol{\times}$ DRAM access saving.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.