大规模图分析中稀疏矩阵-稀疏向量乘法的高效实现

2019 IEEE High Performance Extreme Computing Conference (HPEC) Pub Date : 2019-09-01 DOI:10.1109/HPEC.2019.8916413

M. Serrano

{"title":"大规模图分析中稀疏矩阵-稀疏向量乘法的高效实现","authors":"M. Serrano","doi":"10.1109/HPEC.2019.8916413","DOIUrl":null,"url":null,"abstract":"We developed a parallel algorithm to improve the cache behavior and overall performance for multiplication of sparse matrices with sparse vectors (SpMSpV), an operation used increasingly in large graph analytics, particularly dynamic graphs in social networks and homeland security applications. The proposed algorithm builds upon the two-phase approach of partitioning the multiplication into a scaling phase and an aggregation phase, to achieve more cache-friendly access patterns individually in each phase [6], [3]. However, to handle dynamic graphs and achieve better load balancing for parallel implementation, we use a combination of private and shared bins, with synchronized access to shared bins to exchange the product terms between the two phases. The new algorithm accumulates product terms in private bins for each thread. The algorithm then performs a bulk transfer between a private bin and a shared bin, when the private bin becomes full. Then results are aggregated from the shared bins. In addition, we employ heuristics to decide the best algorithm for SpMSpV based on the number of nonzeros involved in the operation. When the number of nonzeros is large, it may be better to perform the operation as SpMV (sparse matrix times dense vector) despite the added conversion cost. Also, if the number of nonzeros is low it is advantageous to use a simplified algorithm. We compared our algorithm with existing algorithms for SpMSpV, and our evaluation shows that execution time is reduced by several times when large graphs are considered.","PeriodicalId":184253,"journal":{"name":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Efficient implementation of sparse matrix-sparse vector multiplication for large scale graph analytics\",\"authors\":\"M. Serrano\",\"doi\":\"10.1109/HPEC.2019.8916413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We developed a parallel algorithm to improve the cache behavior and overall performance for multiplication of sparse matrices with sparse vectors (SpMSpV), an operation used increasingly in large graph analytics, particularly dynamic graphs in social networks and homeland security applications. The proposed algorithm builds upon the two-phase approach of partitioning the multiplication into a scaling phase and an aggregation phase, to achieve more cache-friendly access patterns individually in each phase [6], [3]. However, to handle dynamic graphs and achieve better load balancing for parallel implementation, we use a combination of private and shared bins, with synchronized access to shared bins to exchange the product terms between the two phases. The new algorithm accumulates product terms in private bins for each thread. The algorithm then performs a bulk transfer between a private bin and a shared bin, when the private bin becomes full. Then results are aggregated from the shared bins. In addition, we employ heuristics to decide the best algorithm for SpMSpV based on the number of nonzeros involved in the operation. When the number of nonzeros is large, it may be better to perform the operation as SpMV (sparse matrix times dense vector) despite the added conversion cost. Also, if the number of nonzeros is low it is advantageous to use a simplified algorithm. We compared our algorithm with existing algorithms for SpMSpV, and our evaluation shows that execution time is reduced by several times when large graphs are considered.\",\"PeriodicalId\":184253,\"journal\":{\"name\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC.2019.8916413\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC.2019.8916413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们开发了一种并行算法来改善缓存行为和稀疏矩阵与稀疏向量乘法(SpMSpV)的整体性能，这种操作越来越多地用于大型图形分析，特别是社交网络和国土安全应用中的动态图形。该算法基于将乘法划分为缩放阶段和聚合阶段的两阶段方法，在每个阶段单独实现更多缓存友好的访问模式[6]，[3]。然而，为了处理动态图并为并行实现实现更好的负载平衡，我们使用私有和共享容器的组合，并同步访问共享容器以在两个阶段之间交换产品条款。新算法在每个线程的私有容器中累积乘积项。然后，当私有bin满时，该算法在私有bin和共享bin之间执行批量传输。然后从共享箱中汇总结果。此外，我们采用启发式方法根据操作中涉及的非零个数来确定SpMSpV的最佳算法。当非零的数量很大时，尽管增加了转换成本，但以SpMV(稀疏矩阵乘以密集向量)的方式执行操作可能会更好。此外，如果非零的数量很低，则使用简化算法是有利的。我们将我们的算法与现有的SpMSpV算法进行了比较，我们的评估表明，当考虑大型图时，执行时间减少了几倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient implementation of sparse matrix-sparse vector multiplication for large scale graph analytics

We developed a parallel algorithm to improve the cache behavior and overall performance for multiplication of sparse matrices with sparse vectors (SpMSpV), an operation used increasingly in large graph analytics, particularly dynamic graphs in social networks and homeland security applications. The proposed algorithm builds upon the two-phase approach of partitioning the multiplication into a scaling phase and an aggregation phase, to achieve more cache-friendly access patterns individually in each phase [6], [3]. However, to handle dynamic graphs and achieve better load balancing for parallel implementation, we use a combination of private and shared bins, with synchronized access to shared bins to exchange the product terms between the two phases. The new algorithm accumulates product terms in private bins for each thread. The algorithm then performs a bulk transfer between a private bin and a shared bin, when the private bin becomes full. Then results are aggregated from the shared bins. In addition, we employ heuristics to decide the best algorithm for SpMSpV based on the number of nonzeros involved in the operation. When the number of nonzeros is large, it may be better to perform the operation as SpMV (sparse matrix times dense vector) despite the added conversion cost. Also, if the number of nonzeros is low it is advantageous to use a simplified algorithm. We compared our algorithm with existing algorithms for SpMSpV, and our evaluation shows that execution time is reduced by several times when large graphs are considered.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE High Performance Extreme Computing Conference (HPEC)

自引率

0.00%

发文量