Jianhua Gao, Weixing Ji, Jie Liu, Senhao Shao, Yizhuo Wang, Feng Shi
{"title":"AMF-CSR: Adaptive Multi-Row Folding of CSR for SpMV on GPU","authors":"Jianhua Gao, Weixing Ji, Jie Liu, Senhao Shao, Yizhuo Wang, Feng Shi","doi":"10.1109/ICPADS53394.2021.00058","DOIUrl":null,"url":null,"abstract":"SpMV is a cost-dominant operation used in many iterative methods for solving large-scale sparse linear systems. However, irregular memory access of SpMV to the multiplied vector leads to low data locality and then harms the performance. This paper presents an adaptive multi-row folding of CSR (AMF-CSR) format for SpMV calculation on GPU. This new storage format supports the folding of the variable number of rows in order to achieve better load balancing in computation. AMF-CSR not only increases the density of non-zero elements in a folded row, thereby improving the access locality of the multiplied vector, but also merges an approximately equal number of nonzero elements in a folded row, hence achieving load balancing. The performance evaluation using 28 sparse matrices shows that the proposed SpMV algorithm based on AMF-CSR achieves the highest speedup of 4.11x and 3.62x on GTX 1080 Ti and Tesla V100 respectively against a fixed multi-row folding-based SpMV algorithm. Evaluation results using 450 regular sparse matrices and 450 irregular sparse matrices also show that AMF-CSR is superior to other SpMV implementations.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
SpMV is a cost-dominant operation used in many iterative methods for solving large-scale sparse linear systems. However, irregular memory access of SpMV to the multiplied vector leads to low data locality and then harms the performance. This paper presents an adaptive multi-row folding of CSR (AMF-CSR) format for SpMV calculation on GPU. This new storage format supports the folding of the variable number of rows in order to achieve better load balancing in computation. AMF-CSR not only increases the density of non-zero elements in a folded row, thereby improving the access locality of the multiplied vector, but also merges an approximately equal number of nonzero elements in a folded row, hence achieving load balancing. The performance evaluation using 28 sparse matrices shows that the proposed SpMV algorithm based on AMF-CSR achieves the highest speedup of 4.11x and 3.62x on GTX 1080 Ti and Tesla V100 respectively against a fixed multi-row folding-based SpMV algorithm. Evaluation results using 450 regular sparse matrices and 450 irregular sparse matrices also show that AMF-CSR is superior to other SpMV implementations.