Analysis of Several Sparse Formats for Matrices used in Sparse-Matrix Dense-Matrix Multiplication for Machine Learning on GPUs

Donghyeon Kim, Jinsung Kim
{"title":"Analysis of Several Sparse Formats for Matrices used in Sparse-Matrix Dense-Matrix Multiplication for Machine Learning on GPUs","authors":"Donghyeon Kim, Jinsung Kim","doi":"10.1109/ICTC55196.2022.9952814","DOIUrl":null,"url":null,"abstract":"Sparse-matrix dense-matrix multiplication (SpMM) receives one sparse matrix and one dense matrix as two inputs, and outputs one dense matrix as a result. It plays a vital role in various fields such as deep neural networks graph neural networks and analysis. CUDA, NVIDIA's parallel computing platform, provides cuSPARSE library to support Basic Linear Algebra Subroutines (BLAS) with sparse matrices such as SpMM. In sparse matrices, zero values can be discarded from storage or computations to accelerate execution. In order to represent only non-zero values in sparse matrices, the cuSPARSE library supports several sparse formats for matrices such as COO (COOrdinate), CSR (Compressed Sparse Row), and CSC (Compressed Sparse Column). In addition, since the 3rd Gen. Tensor Cores with Ampere was introduced, CUDA provides cuSPARSELt library for SpMM whose sparse matrix satisfies a 2:4 sparsity pattern, which is approximately 50% sparsity that can occur in machine learning, etc. In this paper, we compare the cuSPARSE library and the cuSPARSELt library for SpMM, in the case of sparse matrices with a 2:4 sparsity pattern(50% sparsity). Furthermore, we compare the performances of three formats to perform SpMM in the cuSPARSE library, in terms of different sparsity such as 75% sparsity, 87.5% sparsity and 99% sparsity.","PeriodicalId":441404,"journal":{"name":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC55196.2022.9952814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Sparse-matrix dense-matrix multiplication (SpMM) receives one sparse matrix and one dense matrix as two inputs, and outputs one dense matrix as a result. It plays a vital role in various fields such as deep neural networks graph neural networks and analysis. CUDA, NVIDIA's parallel computing platform, provides cuSPARSE library to support Basic Linear Algebra Subroutines (BLAS) with sparse matrices such as SpMM. In sparse matrices, zero values can be discarded from storage or computations to accelerate execution. In order to represent only non-zero values in sparse matrices, the cuSPARSE library supports several sparse formats for matrices such as COO (COOrdinate), CSR (Compressed Sparse Row), and CSC (Compressed Sparse Column). In addition, since the 3rd Gen. Tensor Cores with Ampere was introduced, CUDA provides cuSPARSELt library for SpMM whose sparse matrix satisfies a 2:4 sparsity pattern, which is approximately 50% sparsity that can occur in machine learning, etc. In this paper, we compare the cuSPARSE library and the cuSPARSELt library for SpMM, in the case of sparse matrices with a 2:4 sparsity pattern(50% sparsity). Furthermore, we compare the performances of three formats to perform SpMM in the cuSPARSE library, in terms of different sparsity such as 75% sparsity, 87.5% sparsity and 99% sparsity.
基于gpu的机器学习稀疏矩阵密集矩阵乘法中几种稀疏矩阵格式的分析
稀疏矩阵密集矩阵乘法(SpMM)接收一个稀疏矩阵和一个密集矩阵作为两个输入,结果输出一个密集矩阵。它在深度神经网络、图形神经网络和分析等各个领域发挥着至关重要的作用。NVIDIA的并行计算平台CUDA提供cuSPARSE库来支持具有稀疏矩阵(如SpMM)的基本线性代数子程序(BLAS)。在稀疏矩阵中,可以从存储或计算中丢弃零值以加速执行。为了在稀疏矩阵中只表示非零值,cuSPARSE库支持几种稀疏格式的矩阵,如COO(坐标)、CSR(压缩稀疏行)和CSC(压缩稀疏列)。此外,自从引入了带有Ampere的第三代张量内核以来,CUDA为SpMM提供了cuSPARSELt库,其稀疏矩阵满足2:4的稀疏模式,这大约是机器学习等中可能出现的50%的稀疏性。在本文中,我们比较了用于SpMM的cuSPARSE库和cuSPARSELt库,在稀疏矩阵具有2:4稀疏度模式(50%稀疏度)的情况下。此外,我们比较了三种格式在cuSPARSE库中执行SpMM的性能,根据不同的稀疏度,如75%稀疏度,87.5%稀疏度和99%稀疏度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信