稀疏数据结构的并行转置

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926291

Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng

{"title":"稀疏数据结构的并行转置","authors":"Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng","doi":"10.1145/2925426.2926291","DOIUrl":null,"url":null,"abstract":"Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel transposition for sparse matrices and graphs, have not received the attention they deserve. In this paper, we first identify that the transposition operation can be a bottleneck of some fundamental sparse matrix and graph algorithms. Then, we revisit the performance and scalability of parallel transposition approaches on x86-based multi-core and many-core processors. Based on the insights obtained, we propose two new parallel transposition algorithms: ScanTrans and MergeTrans. The experimental results show that our ScanTrans method achieves an average of 2.8-fold (up to 6.2-fold) speedup over the parallel transposition in the latest vendor-supplied library on an Intel multi-core CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"192 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":"{\"title\":\"Parallel Transposition of Sparse Data Structures\",\"authors\":\"Hao Wang, Weifeng Liu, Kaixi Hou, Wu-chun Feng\",\"doi\":\"10.1145/2925426.2926291\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel transposition for sparse matrices and graphs, have not received the attention they deserve. In this paper, we first identify that the transposition operation can be a bottleneck of some fundamental sparse matrix and graph algorithms. Then, we revisit the performance and scalability of parallel transposition approaches on x86-based multi-core and many-core processors. Based on the insights obtained, we propose two new parallel transposition algorithms: ScanTrans and MergeTrans. The experimental results show that our ScanTrans method achieves an average of 2.8-fold (up to 6.2-fold) speedup over the parallel transposition in the latest vendor-supplied library on an Intel multi-core CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor.\",\"PeriodicalId\":422112,\"journal\":{\"name\":\"Proceedings of the 2016 International Conference on Supercomputing\",\"volume\":\"192 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"44\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2925426.2926291\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926291","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

摘要

计算科学和社会科学中的许多应用都利用了所获取数据的稀疏性和连通性。尽管许多并行稀疏原语(如稀疏矩阵向量乘法)已经得到了广泛的研究，但其他一些重要的构建模块，如稀疏矩阵和图的并行转置，却没有得到应有的重视。在本文中，我们首先确定了转置运算可能是一些基本稀疏矩阵和图算法的瓶颈。然后，我们重新讨论了并行转置方法在基于x86的多核和多核处理器上的性能和可扩展性。在此基础上，我们提出了两种新的并行换位算法:ScanTrans和MergeTrans。实验结果表明，在英特尔多核CPU平台上，我们的ScanTrans方法在最新供应商提供的库中实现了平均2.8倍(最高6.2倍)的并行转置加速，而MergeTrans方法在英特尔Xeon Phi多核处理器上实现了平均3.4倍(最高11.7倍)的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel Transposition of Sparse Data Structures

Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel transposition for sparse matrices and graphs, have not received the attention they deserve. In this paper, we first identify that the transposition operation can be a bottleneck of some fundamental sparse matrix and graph algorithms. Then, we revisit the performance and scalability of parallel transposition approaches on x86-based multi-core and many-core processors. Based on the insights obtained, we propose two new parallel transposition algorithms: ScanTrans and MergeTrans. The experimental results show that our ScanTrans method achieves an average of 2.8-fold (up to 6.2-fold) speedup over the parallel transposition in the latest vendor-supplied library on an Intel multi-core CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量