Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI:10.1109/IPDPSW.2014.160

B. Neelima, G. R. M. Reddy, Prakash S. Raghavendra

{"title":"Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU","authors":"B. Neelima, G. R. M. Reddy, Prakash S. Raghavendra","doi":"10.1109/IPDPSW.2014.160","DOIUrl":null,"url":null,"abstract":"Many-threaded architecture based Graphics Processing Units (GPUs) are good for general purpose computations for achieving high performance. The processor has latency hiding mechanism through which it hides the memory access time in such a way that when one warp (group of 32 threads) is computing, the other warps perform memory bound access. But for memory access bound irregular applications such as Sparse Matrix Vector Multiplication (SpMV), memory access times are high and hence improving the performance of such applications on GPU is a challenging research issue. Further, optimizing SpMV time on GPU is an important task for iterative applications like jacobi and conjugate gradient. However, there is a need to consider the overheads caused while computing SpMV on GPU. Transforming the input matrix to a desired format and communicating the data from CPU to GPU are non-trivial overheads associated with SpMV computation on GPU. If the chosen format is not suitable for the given input sparse matrix then desired performance improvements cannot be achieved. Motivated by this observation, this paper proposes a method to chose an optimal sparse matrix format, focusing on the applications where CPU to GPU communication time and pre-processing time are nontrivial. The experimental results show that the predicted format by the model matches with that of the actual high performing format when total SpMV time in terms of pre-processing time, CPU to GPU communication time and SpMV computation time on GPU, is taken into account. The model predicts an optimal format for any given input sparse matrix with a very small overhead of prediction within an application. Compared to the format to achieve high performance only on GPU, our approach is more comprehensive and valuable. This paper also proposes to use a communication and pre-processing overhead optimizing sparse matrix format to be used when these overheads are non trivial.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"PC-20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Many-threaded architecture based Graphics Processing Units (GPUs) are good for general purpose computations for achieving high performance. The processor has latency hiding mechanism through which it hides the memory access time in such a way that when one warp (group of 32 threads) is computing, the other warps perform memory bound access. But for memory access bound irregular applications such as Sparse Matrix Vector Multiplication (SpMV), memory access times are high and hence improving the performance of such applications on GPU is a challenging research issue. Further, optimizing SpMV time on GPU is an important task for iterative applications like jacobi and conjugate gradient. However, there is a need to consider the overheads caused while computing SpMV on GPU. Transforming the input matrix to a desired format and communicating the data from CPU to GPU are non-trivial overheads associated with SpMV computation on GPU. If the chosen format is not suitable for the given input sparse matrix then desired performance improvements cannot be achieved. Motivated by this observation, this paper proposes a method to chose an optimal sparse matrix format, focusing on the applications where CPU to GPU communication time and pre-processing time are nontrivial. The experimental results show that the predicted format by the model matches with that of the actual high performing format when total SpMV time in terms of pre-processing time, CPU to GPU communication time and SpMV computation time on GPU, is taken into account. The model predicts an optimal format for any given input sparse matrix with a very small overhead of prediction within an application. Compared to the format to achieve high performance only on GPU, our approach is more comprehensive and valuable. This paper also proposes to use a communication and pre-processing overhead optimizing sparse matrix format to be used when these overheads are non trivial.

查看原文本刊更多论文

预测GPU上SpMV计算的最优稀疏矩阵格式

基于多线程架构的图形处理单元(gpu)非常适合用于实现高性能的通用计算。处理器具有延迟隐藏机制，通过这种机制，它隐藏内存访问时间，以便当一个warp(32个线程组)正在计算时，其他warp执行内存绑定访问。但对于稀疏矩阵向量乘法(SpMV)等内存访问受限的不规则应用，由于内存访问时间高，因此提高此类应用在GPU上的性能是一个具有挑战性的研究问题。此外，优化GPU上的SpMV时间是jacobi和共轭梯度等迭代应用的重要任务。然而，有必要考虑在GPU上计算SpMV时造成的开销。将输入矩阵转换为所需格式并将数据从CPU传输到GPU是与GPU上的SpMV计算相关的重要开销。如果选择的格式不适合给定的输入稀疏矩阵，则无法实现期望的性能改进。基于这一观察结果，本文提出了一种选择最优稀疏矩阵格式的方法，重点关注CPU到GPU通信时间和预处理时间不平凡的应用。实验结果表明，考虑到SpMV的预处理时间、CPU到GPU的通信时间和GPU上SpMV的计算时间，该模型预测的格式与实际的高性能格式相匹配。该模型预测任何给定输入稀疏矩阵的最佳格式，在应用程序中预测开销非常小。与仅在GPU上实现高性能的格式相比，我们的方法更全面，更有价值。本文还提出了使用通信和预处理开销来优化稀疏矩阵格式，以便在这些开销不平凡的情况下使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Parallel & Distributed Processing Symposium Workshops

自引率

0.00%

发文量