Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI:10.1109/ICPP.2011.82

Kiran Kumar Matam, Kishore Kothapalli

{"title":"Accelerating Sparse Matrix Vector Multiplication in Iterative Methods Using GPU","authors":"Kiran Kumar Matam, Kishore Kothapalli","doi":"10.1109/ICPP.2011.82","DOIUrl":null,"url":null,"abstract":"Multiplying a sparse matrix with a vector (spmv for short) is a fundamental operation in many linear algebra kernels. Having an efficient spmv kernel on modern architectures such as the GPUs is therefore of principal interest. The computational challenges that spmv poses are significantlydifferent compared to that of the dense linear algebra kernels. Recent work in this direction has focused on designing data structures to represent sparse matrices so as to improve theefficiency of spmv kernels. However, as the nature of sparseness differs across sparse matrices, there is no clear answer as to which data structure to use given a sparse matrix. In this work, we address this problem by devising techniques to understand the nature of the sparse matrix and then choose appropriate data structures accordingly. By using our technique, we are able to improve the performance of the spmv kernel on an Nvidia Tesla GPU (C1060) by a factor of up to80% in some instances, and about 25% on average compared to the best results of Bell and Garland [3] on the standard dataset (cf. Williams et al. SC'07) used in recent literature. We also use our spmv in the conjugate gradient method and show an average 20% improvement compared to using HYB spmv of [3], on the dataset obtained from the The University of Florida Sparse Matrix Collection [9].","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2011.82","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 49

Abstract

Multiplying a sparse matrix with a vector (spmv for short) is a fundamental operation in many linear algebra kernels. Having an efficient spmv kernel on modern architectures such as the GPUs is therefore of principal interest. The computational challenges that spmv poses are significantlydifferent compared to that of the dense linear algebra kernels. Recent work in this direction has focused on designing data structures to represent sparse matrices so as to improve theefficiency of spmv kernels. However, as the nature of sparseness differs across sparse matrices, there is no clear answer as to which data structure to use given a sparse matrix. In this work, we address this problem by devising techniques to understand the nature of the sparse matrix and then choose appropriate data structures accordingly. By using our technique, we are able to improve the performance of the spmv kernel on an Nvidia Tesla GPU (C1060) by a factor of up to80% in some instances, and about 25% on average compared to the best results of Bell and Garland [3] on the standard dataset (cf. Williams et al. SC'07) used in recent literature. We also use our spmv in the conjugate gradient method and show an average 20% improvement compared to using HYB spmv of [3], on the dataset obtained from the The University of Florida Sparse Matrix Collection [9].

查看原文本刊更多论文

利用GPU加速稀疏矩阵向量乘法迭代方法

稀疏矩阵与向量(简称spmv)相乘是许多线性代数核中的基本运算。因此，在gpu等现代架构上拥有一个高效的spmv内核是最重要的。与密集线性代数核相比，spmv的计算挑战有很大不同。最近在这个方向上的工作集中在设计数据结构来表示稀疏矩阵，以提高spmv核的效率。然而，由于稀疏性的性质在不同的稀疏矩阵中是不同的，所以对于给定一个稀疏矩阵应该使用哪种数据结构并没有明确的答案。在这项工作中，我们通过设计技术来理解稀疏矩阵的本质，然后相应地选择适当的数据结构来解决这个问题。通过使用我们的技术，我们能够在某些情况下将Nvidia Tesla GPU (C1060)上的spmv内核的性能提高高达80%，与Bell和Garland[3]在标准数据集上的最佳结果(参见Williams等人)相比，平均提高约25%。SC'07)用于最近的文献。我们还在共轭梯度方法中使用了我们的spmv，与使用HYB的spmv[3]相比，在从佛罗里达大学稀疏矩阵集合[9]获得的数据集上平均提高了20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Parallel Processing

自引率

0.00%

发文量