Weicai Ye, Chenghuan Huang, Jiasheng Huang, Jiajun Li, Yao Lu, Ying Jiang
{"title":"面向积分方程的矢量化SpMV算法及其在CT成像重建中的应用","authors":"Weicai Ye, Chenghuan Huang, Jiasheng Huang, Jiajun Li, Yao Lu, Ying Jiang","doi":"10.1109/ipdps53621.2022.00080","DOIUrl":null,"url":null,"abstract":"Sparse-matrix vector multiplication (SpMV) is a core routine in many applications. Its performance is limited by memory bandwidth, which is for matrix transport between processors and memory, and instruction latency in computations. Vectorized operations (SIMD) can dramatically improve the execution efficiency, but irregular matrices' sparsity pattern is not compatible with the style of SIMD execution. We present a new matrix format, Compressed Sparse Column Vector (CSCV), and a corresponding vectorized SpMV algorithm for matrices arising from integral equations. This SpMV algorithm can inherently suit wide SIMD instructions and reduce the memory bandwidth used. We implement this algorithm for Computed Tomography (CT) imaging reconstructions on both Intel and AMD x86 platforms and compare it with seven state-of-the-art SpMV implementations using different CT imaging matrices. Experimental results show that CSCV can achieve up to 96.9 GFLOP/s in single-precision tests, with speedup 3.70× to MKL and 3.48× to the second place implementation. Furthermore, the implementation of CSCV SpMV is performance portable, which excludes almost all SIMD assemble code and has promising performance with compiler-assisted vectorization. Code Availability: https://github.com/sysu-compsci/cscv","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Integral-equation-oriented Vectorized SpMV Algorithm and its Application on CT Imaging Reconstruction\",\"authors\":\"Weicai Ye, Chenghuan Huang, Jiasheng Huang, Jiajun Li, Yao Lu, Ying Jiang\",\"doi\":\"10.1109/ipdps53621.2022.00080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparse-matrix vector multiplication (SpMV) is a core routine in many applications. Its performance is limited by memory bandwidth, which is for matrix transport between processors and memory, and instruction latency in computations. Vectorized operations (SIMD) can dramatically improve the execution efficiency, but irregular matrices' sparsity pattern is not compatible with the style of SIMD execution. We present a new matrix format, Compressed Sparse Column Vector (CSCV), and a corresponding vectorized SpMV algorithm for matrices arising from integral equations. This SpMV algorithm can inherently suit wide SIMD instructions and reduce the memory bandwidth used. We implement this algorithm for Computed Tomography (CT) imaging reconstructions on both Intel and AMD x86 platforms and compare it with seven state-of-the-art SpMV implementations using different CT imaging matrices. Experimental results show that CSCV can achieve up to 96.9 GFLOP/s in single-precision tests, with speedup 3.70× to MKL and 3.48× to the second place implementation. Furthermore, the implementation of CSCV SpMV is performance portable, which excludes almost all SIMD assemble code and has promising performance with compiler-assisted vectorization. Code Availability: https://github.com/sysu-compsci/cscv\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ipdps53621.2022.00080\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Integral-equation-oriented Vectorized SpMV Algorithm and its Application on CT Imaging Reconstruction
Sparse-matrix vector multiplication (SpMV) is a core routine in many applications. Its performance is limited by memory bandwidth, which is for matrix transport between processors and memory, and instruction latency in computations. Vectorized operations (SIMD) can dramatically improve the execution efficiency, but irregular matrices' sparsity pattern is not compatible with the style of SIMD execution. We present a new matrix format, Compressed Sparse Column Vector (CSCV), and a corresponding vectorized SpMV algorithm for matrices arising from integral equations. This SpMV algorithm can inherently suit wide SIMD instructions and reduce the memory bandwidth used. We implement this algorithm for Computed Tomography (CT) imaging reconstructions on both Intel and AMD x86 platforms and compare it with seven state-of-the-art SpMV implementations using different CT imaging matrices. Experimental results show that CSCV can achieve up to 96.9 GFLOP/s in single-precision tests, with speedup 3.70× to MKL and 3.48× to the second place implementation. Furthermore, the implementation of CSCV SpMV is performance portable, which excludes almost all SIMD assemble code and has promising performance with compiler-assisted vectorization. Code Availability: https://github.com/sysu-compsci/cscv