稀疏矩阵向量乘法中通信不平衡对性能影响的评估

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Pub Date : 2015-03-04 DOI:10.1109/PDP.2015.37

G. Utrera, Marisa Gil, X. Martorell

{"title":"稀疏矩阵向量乘法中通信不平衡对性能影响的评估","authors":"G. Utrera, Marisa Gil, X. Martorell","doi":"10.1109/PDP.2015.37","DOIUrl":null,"url":null,"abstract":"HPC applications make intensive use of large sparse matrices with the matrix-vector product representing a significant fraction of the total run-time. These matrices are characterized by non-uniform matrix structures and irregular memory accesses that make it difficult to achieve a good scalability in modern HPC platforms with multi-or many-cores, SIMD and high-speed communication networks. One of the reasons for this drawback in scalability is caused by communication due to imbalance in both message synchronization and size. In this work we analyze such load imbalance in the sparse matrix vector product (SpMV) when running in a multi-node cluster using high-speed interconnection networks. The experimental alternatives to diminish communication load imbalance are evaluated on two programming models MPI+fork-join and MPI+task-based parallelism) using certain optimizations (i.e. computation-communication overlap or parallel send messages). The performance achieved for large matrix sizes can be up to 9%.","PeriodicalId":285111,"journal":{"name":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication\",\"authors\":\"G. Utrera, Marisa Gil, X. Martorell\",\"doi\":\"10.1109/PDP.2015.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"HPC applications make intensive use of large sparse matrices with the matrix-vector product representing a significant fraction of the total run-time. These matrices are characterized by non-uniform matrix structures and irregular memory accesses that make it difficult to achieve a good scalability in modern HPC platforms with multi-or many-cores, SIMD and high-speed communication networks. One of the reasons for this drawback in scalability is caused by communication due to imbalance in both message synchronization and size. In this work we analyze such load imbalance in the sparse matrix vector product (SpMV) when running in a multi-node cluster using high-speed interconnection networks. The experimental alternatives to diminish communication load imbalance are evaluated on two programming models MPI+fork-join and MPI+task-based parallelism) using certain optimizations (i.e. computation-communication overlap or parallel send messages). The performance achieved for large matrix sizes can be up to 9%.\",\"PeriodicalId\":285111,\"journal\":{\"name\":\"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2015.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2015.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

高性能计算应用程序大量使用大型稀疏矩阵，其中矩阵-向量乘积占总运行时间的很大一部分。这些矩阵具有矩阵结构不均匀、存储访问不规范等特点，在现代多核或多核、SIMD和高速通信网络的高性能计算平台上难以实现良好的可扩展性。在可伸缩性方面出现这种缺陷的原因之一是由于消息同步和大小的不平衡造成的通信。本文分析了稀疏矩阵向量积(SpMV)在高速互连网络中运行时的负载不平衡问题。在两种编程模型(MPI+fork-join和MPI+基于任务的并行)上，使用一定的优化(即计算-通信重叠或并行发送消息)评估了减少通信负载不平衡的实验替代方案。大矩阵尺寸的性能可以达到9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication

HPC applications make intensive use of large sparse matrices with the matrix-vector product representing a significant fraction of the total run-time. These matrices are characterized by non-uniform matrix structures and irregular memory accesses that make it difficult to achieve a good scalability in modern HPC platforms with multi-or many-cores, SIMD and high-speed communication networks. One of the reasons for this drawback in scalability is caused by communication due to imbalance in both message synchronization and size. In this work we analyze such load imbalance in the sparse matrix vector product (SpMV) when running in a multi-node cluster using high-speed interconnection networks. The experimental alternatives to diminish communication load imbalance are evaluated on two programming models MPI+fork-join and MPI+task-based parallelism) using certain optimizations (i.e. computation-communication overlap or parallel send messages). The performance achieved for large matrix sizes can be up to 9%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

自引率

0.00%

发文量