{"title":"An Optimized GP-GPU Warp Scheduling Algorithm for Sparse Matrix-Vector Multiplication","authors":"Lifeng Liu, Meilin Liu, Chong-Jun Wang","doi":"10.1109/NAS.2013.35","DOIUrl":null,"url":null,"abstract":"GP-GPUs have been used as the platform for many applications due to their powerful computation ability and massively parallel features. In this paper, we first investigate the CSR sparse matrix format, the performance of existing optimized SpMV (Sparse matrix-vector multiplication) algorithms, and analyze the memory access patterns of the SpMV algorithms. Based on the analysis of the memory access patterns, we propose a new thread scheduling technique that can take advantage of inter-warp locality and intra-warp locality simultaneously, and also can achieve memory coalescing automatically. This proposed new scheduling technique will change the memory access pattern of SpMVs significantly. The simulation results show that the performance of the SpMV using the new proposed thread scheduling technique achieves much better performance than the implementation of the SpMV optimized by other techniques.","PeriodicalId":213334,"journal":{"name":"2013 IEEE Eighth International Conference on Networking, Architecture and Storage","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Eighth International Conference on Networking, Architecture and Storage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2013.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
GP-GPUs have been used as the platform for many applications due to their powerful computation ability and massively parallel features. In this paper, we first investigate the CSR sparse matrix format, the performance of existing optimized SpMV (Sparse matrix-vector multiplication) algorithms, and analyze the memory access patterns of the SpMV algorithms. Based on the analysis of the memory access patterns, we propose a new thread scheduling technique that can take advantage of inter-warp locality and intra-warp locality simultaneously, and also can achieve memory coalescing automatically. This proposed new scheduling technique will change the memory access pattern of SpMVs significantly. The simulation results show that the performance of the SpMV using the new proposed thread scheduling technique achieves much better performance than the implementation of the SpMV optimized by other techniques.
gp - gpu以其强大的计算能力和大规模并行特性被广泛应用于许多应用。在本文中,我们首先研究了CSR稀疏矩阵格式,现有优化的SpMV(稀疏矩阵向量乘法)算法的性能,并分析了SpMV算法的内存访问模式。在对内存访问模式进行分析的基础上,提出了一种新的线程调度技术,该技术可以同时利用warp间局部性和warp内局部性,并能自动实现内存合并。这种新的调度技术将显著改变spmv的内存访问模式。仿真结果表明,采用新线程调度技术的SpMV比采用其他技术优化的SpMV实现的性能要好得多。