{"title":"Towards Efficient SpMV on Sunway Manycore Architectures","authors":"Changxi Liu, Biwei Xie, Xin Liu, Wei Xue, Hailong Yang, Xu Liu","doi":"10.1145/3205289.3205313","DOIUrl":null,"url":null,"abstract":"Sparse Matrix-Vector Multiplication (SpMV) is an essential computation kernel for many data-analytic workloads running in both supercomputers and data centers. The intrinsic irregularity in SpMV is challenging to achieve high performance, especially when porting to new architectures. In this paper, we present our work on designing and implementing efficient SpMV algorithms on Sunway, a novel architecture with many unique features. To fully exploit the Sunway architecture, we have designed a dual-side multi-level partition mechanism on both sparse matrices and hardware resources to improve locality and parallelism. On one hand, we partition sparse matrices into blocks, tiles, and slices for different granularities. On the other hand, we partition cores in a Sunway processor into fleets, and further dedicate part of cores in a fleet as computation and I/O cores. Moreover, we have optimized the communication between partitions to further improve the performance. Our scheme is generally applicable to different SpMV formats and implementations. For evaluation, we have applied our techniques atop a popular SpMV format, CSR. Experimental results on 18 datasets show that our optimization yields up to 15.5x (12.3x on average) speedups.","PeriodicalId":441217,"journal":{"name":"Proceedings of the 2018 International Conference on Supercomputing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3205289.3205313","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 44
Abstract
Sparse Matrix-Vector Multiplication (SpMV) is an essential computation kernel for many data-analytic workloads running in both supercomputers and data centers. The intrinsic irregularity in SpMV is challenging to achieve high performance, especially when porting to new architectures. In this paper, we present our work on designing and implementing efficient SpMV algorithms on Sunway, a novel architecture with many unique features. To fully exploit the Sunway architecture, we have designed a dual-side multi-level partition mechanism on both sparse matrices and hardware resources to improve locality and parallelism. On one hand, we partition sparse matrices into blocks, tiles, and slices for different granularities. On the other hand, we partition cores in a Sunway processor into fleets, and further dedicate part of cores in a fleet as computation and I/O cores. Moreover, we have optimized the communication between partitions to further improve the performance. Our scheme is generally applicable to different SpMV formats and implementations. For evaluation, we have applied our techniques atop a popular SpMV format, CSR. Experimental results on 18 datasets show that our optimization yields up to 15.5x (12.3x on average) speedups.