Analysis of Several Sparse Formats for Matrices used in Sparse-Matrix Dense-Matrix Multiplication for Machine Learning on GPUs

2022 13th International Conference on Information and Communication Technology Convergence (ICTC) Pub Date : 2022-10-19 DOI:10.1109/ICTC55196.2022.9952814

Donghyeon Kim, Jinsung Kim

{"title":"Analysis of Several Sparse Formats for Matrices used in Sparse-Matrix Dense-Matrix Multiplication for Machine Learning on GPUs","authors":"Donghyeon Kim, Jinsung Kim","doi":"10.1109/ICTC55196.2022.9952814","DOIUrl":null,"url":null,"abstract":"Sparse-matrix dense-matrix multiplication (SpMM) receives one sparse matrix and one dense matrix as two inputs, and outputs one dense matrix as a result. It plays a vital role in various fields such as deep neural networks graph neural networks and analysis. CUDA, NVIDIA's parallel computing platform, provides cuSPARSE library to support Basic Linear Algebra Subroutines (BLAS) with sparse matrices such as SpMM. In sparse matrices, zero values can be discarded from storage or computations to accelerate execution. In order to represent only non-zero values in sparse matrices, the cuSPARSE library supports several sparse formats for matrices such as COO (COOrdinate), CSR (Compressed Sparse Row), and CSC (Compressed Sparse Column). In addition, since the 3rd Gen. Tensor Cores with Ampere was introduced, CUDA provides cuSPARSELt library for SpMM whose sparse matrix satisfies a 2:4 sparsity pattern, which is approximately 50% sparsity that can occur in machine learning, etc. In this paper, we compare the cuSPARSE library and the cuSPARSELt library for SpMM, in the case of sparse matrices with a 2:4 sparsity pattern(50% sparsity). Furthermore, we compare the performances of three formats to perform SpMM in the cuSPARSE library, in terms of different sparsity such as 75% sparsity, 87.5% sparsity and 99% sparsity.","PeriodicalId":441404,"journal":{"name":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 13th International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC55196.2022.9952814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Sparse-matrix dense-matrix multiplication (SpMM) receives one sparse matrix and one dense matrix as two inputs, and outputs one dense matrix as a result. It plays a vital role in various fields such as deep neural networks graph neural networks and analysis. CUDA, NVIDIA's parallel computing platform, provides cuSPARSE library to support Basic Linear Algebra Subroutines (BLAS) with sparse matrices such as SpMM. In sparse matrices, zero values can be discarded from storage or computations to accelerate execution. In order to represent only non-zero values in sparse matrices, the cuSPARSE library supports several sparse formats for matrices such as COO (COOrdinate), CSR (Compressed Sparse Row), and CSC (Compressed Sparse Column). In addition, since the 3rd Gen. Tensor Cores with Ampere was introduced, CUDA provides cuSPARSELt library for SpMM whose sparse matrix satisfies a 2:4 sparsity pattern, which is approximately 50% sparsity that can occur in machine learning, etc. In this paper, we compare the cuSPARSE library and the cuSPARSELt library for SpMM, in the case of sparse matrices with a 2:4 sparsity pattern(50% sparsity). Furthermore, we compare the performances of three formats to perform SpMM in the cuSPARSE library, in terms of different sparsity such as 75% sparsity, 87.5% sparsity and 99% sparsity.

查看原文本刊更多论文

基于gpu的机器学习稀疏矩阵密集矩阵乘法中几种稀疏矩阵格式的分析

稀疏矩阵密集矩阵乘法(SpMM)接收一个稀疏矩阵和一个密集矩阵作为两个输入，结果输出一个密集矩阵。它在深度神经网络、图形神经网络和分析等各个领域发挥着至关重要的作用。NVIDIA的并行计算平台CUDA提供cuSPARSE库来支持具有稀疏矩阵(如SpMM)的基本线性代数子程序(BLAS)。在稀疏矩阵中，可以从存储或计算中丢弃零值以加速执行。为了在稀疏矩阵中只表示非零值，cuSPARSE库支持几种稀疏格式的矩阵，如COO(坐标)、CSR(压缩稀疏行)和CSC(压缩稀疏列)。此外，自从引入了带有Ampere的第三代张量内核以来，CUDA为SpMM提供了cuSPARSELt库，其稀疏矩阵满足2:4的稀疏模式，这大约是机器学习等中可能出现的50%的稀疏性。在本文中，我们比较了用于SpMM的cuSPARSE库和cuSPARSELt库，在稀疏矩阵具有2:4稀疏度模式(50%稀疏度)的情况下。此外，我们比较了三种格式在cuSPARSE库中执行SpMM的性能，根据不同的稀疏度，如75%稀疏度，87.5%稀疏度和99%稀疏度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 13th International Conference on Information and Communication Technology Convergence (ICTC)

自引率

0.00%

发文量