J. Pan, Lei Xiao, Min Tian, Li Wang, Chaochao Yang, Renjiang Chen, Zenghui Ren, Anjun Liu, Guanghui Zhu
{"title":"hsSpMV:用于SW26010-Pro多核处理器的异构spm聚合SpMV","authors":"J. Pan, Lei Xiao, Min Tian, Li Wang, Chaochao Yang, Renjiang Chen, Zenghui Ren, Anjun Liu, Guanghui Zhu","doi":"10.1109/CCGrid57682.2023.00016","DOIUrl":null,"url":null,"abstract":"Sparse matrix vector multiplication (SpMV) is a critical performance bottleneck for numerical simulation and artificial intelligence training. The new generation of Sunway supercomputer is the advanced exascale supercomputer in China. The SW26010-Pro many-core processor renders itself as a competitive candidate for its attractive computational power in both numerical simulation and artificial intelligence training. In this paper, we propose a heterogeneous and SPM-aggregated SpMV kernel, specifically designed for the SW26010-Pro many-core processor. To fully exploit the computational power of the SW26010-Pro and balance the load of each core group(CG) during computation, we employ asynchronous computation workflow and propose the SPM-aggregated strategy and vector adaptive mapping algorithm. In addition, we propose the two-level data partition scheme to implement computational load balance. In order to improve memory access efficiency, we directly access memory via DMA controller to replace the discrete memory access. Using several optimizations, we achieve a 77.16x speedup compared to the original implementation. Our experimental results show that the hsSpMV yields up to 3.82× speedups on average compared to the SpMV kernel of the state-of-the-art Sunway math library xMath2.0.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"283 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"hsSpMV: A Heterogeneous and SPM-aggregated SpMV for SW26010-Pro many-core processor\",\"authors\":\"J. Pan, Lei Xiao, Min Tian, Li Wang, Chaochao Yang, Renjiang Chen, Zenghui Ren, Anjun Liu, Guanghui Zhu\",\"doi\":\"10.1109/CCGrid57682.2023.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparse matrix vector multiplication (SpMV) is a critical performance bottleneck for numerical simulation and artificial intelligence training. The new generation of Sunway supercomputer is the advanced exascale supercomputer in China. The SW26010-Pro many-core processor renders itself as a competitive candidate for its attractive computational power in both numerical simulation and artificial intelligence training. In this paper, we propose a heterogeneous and SPM-aggregated SpMV kernel, specifically designed for the SW26010-Pro many-core processor. To fully exploit the computational power of the SW26010-Pro and balance the load of each core group(CG) during computation, we employ asynchronous computation workflow and propose the SPM-aggregated strategy and vector adaptive mapping algorithm. In addition, we propose the two-level data partition scheme to implement computational load balance. In order to improve memory access efficiency, we directly access memory via DMA controller to replace the discrete memory access. Using several optimizations, we achieve a 77.16x speedup compared to the original implementation. Our experimental results show that the hsSpMV yields up to 3.82× speedups on average compared to the SpMV kernel of the state-of-the-art Sunway math library xMath2.0.\",\"PeriodicalId\":363806,\"journal\":{\"name\":\"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)\",\"volume\":\"283 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid57682.2023.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid57682.2023.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
hsSpMV: A Heterogeneous and SPM-aggregated SpMV for SW26010-Pro many-core processor
Sparse matrix vector multiplication (SpMV) is a critical performance bottleneck for numerical simulation and artificial intelligence training. The new generation of Sunway supercomputer is the advanced exascale supercomputer in China. The SW26010-Pro many-core processor renders itself as a competitive candidate for its attractive computational power in both numerical simulation and artificial intelligence training. In this paper, we propose a heterogeneous and SPM-aggregated SpMV kernel, specifically designed for the SW26010-Pro many-core processor. To fully exploit the computational power of the SW26010-Pro and balance the load of each core group(CG) during computation, we employ asynchronous computation workflow and propose the SPM-aggregated strategy and vector adaptive mapping algorithm. In addition, we propose the two-level data partition scheme to implement computational load balance. In order to improve memory access efficiency, we directly access memory via DMA controller to replace the discrete memory access. Using several optimizations, we achieve a 77.16x speedup compared to the original implementation. Our experimental results show that the hsSpMV yields up to 3.82× speedups on average compared to the SpMV kernel of the state-of-the-art Sunway math library xMath2.0.