大规模特征值计算中的正交并行层

IF 1.2 Q3 COMPUTER SCIENCE, THEORY & METHODS

ACM Transactions on Parallel Computing Pub Date : 2022-09-05 DOI:10.1145/3614444

A. Alvermann, G. Hager, H. Fehske

{"title":"大规模特征值计算中的正交并行层","authors":"A. Alvermann, G. Hager, H. Fehske","doi":"10.1145/3614444","DOIUrl":null,"url":null,"abstract":"We address the communication overhead of distributed sparse matrix-(multiple)-vector multiplication in the context of large-scale eigensolvers, using filter diagonalization as an example. The basis of our study is a performance model, which includes a communication metric that is computed directly from the matrix sparsity pattern without running any code. The performance model quantifies to which extent scalability and parallel efficiency are lost due to communication overhead. To restore scalability, we identify two orthogonal layers of parallelism in the filter diagonalization technique. In the horizontal layer the rows of the sparse matrix are distributed across individual processes. In the vertical layer bundles of multiple vectors are distributed across separate process groups. An analysis in terms of the communication metric predicts that scalability can be restored if, and only if, one implements the two orthogonal layers of parallelism via different distributed vector layouts. Our theoretical analysis is corroborated by benchmarks for application matrices from quantum and solid state physics, road networks, and nonlinear programming. We finally demonstrate the benefits of using orthogonal layers of parallelism with two exemplary application cases—an exciton and a strongly correlated electron system—which incur either small or large communication overhead.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":"10 1","pages":"1 - 31"},"PeriodicalIF":1.2000,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations\",\"authors\":\"A. Alvermann, G. Hager, H. Fehske\",\"doi\":\"10.1145/3614444\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the communication overhead of distributed sparse matrix-(multiple)-vector multiplication in the context of large-scale eigensolvers, using filter diagonalization as an example. The basis of our study is a performance model, which includes a communication metric that is computed directly from the matrix sparsity pattern without running any code. The performance model quantifies to which extent scalability and parallel efficiency are lost due to communication overhead. To restore scalability, we identify two orthogonal layers of parallelism in the filter diagonalization technique. In the horizontal layer the rows of the sparse matrix are distributed across individual processes. In the vertical layer bundles of multiple vectors are distributed across separate process groups. An analysis in terms of the communication metric predicts that scalability can be restored if, and only if, one implements the two orthogonal layers of parallelism via different distributed vector layouts. Our theoretical analysis is corroborated by benchmarks for application matrices from quantum and solid state physics, road networks, and nonlinear programming. We finally demonstrate the benefits of using orthogonal layers of parallelism with two exemplary application cases—an exciton and a strongly correlated electron system—which incur either small or large communication overhead.\",\"PeriodicalId\":42115,\"journal\":{\"name\":\"ACM Transactions on Parallel Computing\",\"volume\":\"10 1\",\"pages\":\"1 - 31\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Parallel Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3614444\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Parallel Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3614444","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 1

摘要

我们以滤波器对角化为例，解决了大规模特征求解中分布式稀疏矩阵-(多)向量乘法的通信开销。我们研究的基础是一个性能模型，其中包括一个通信度量，该度量直接从矩阵稀疏性模式计算而不运行任何代码。性能模型量化了由于通信开销而导致的可伸缩性和并行效率损失的程度。为了恢复可扩展性，我们在滤波器对角化技术中确定了两个正交的并行层。在水平层，稀疏矩阵的行分布在各个过程之间。在垂直层中，多个向量的束分布在不同的过程组中。根据通信度量的分析预测，当且仅当通过不同的分布式矢量布局实现两个正交的并行层时，可以恢复可伸缩性。我们的理论分析得到了来自量子和固态物理、道路网络和非线性规划的应用矩阵基准的证实。最后，我们通过两个示例应用案例(激子和强相关电子系统)演示了使用正交并行层的好处，这两个示例应用会导致或小或大的通信开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Orthogonal Layers of Parallelism in Large-Scale Eigenvalue Computations

We address the communication overhead of distributed sparse matrix-(multiple)-vector multiplication in the context of large-scale eigensolvers, using filter diagonalization as an example. The basis of our study is a performance model, which includes a communication metric that is computed directly from the matrix sparsity pattern without running any code. The performance model quantifies to which extent scalability and parallel efficiency are lost due to communication overhead. To restore scalability, we identify two orthogonal layers of parallelism in the filter diagonalization technique. In the horizontal layer the rows of the sparse matrix are distributed across individual processes. In the vertical layer bundles of multiple vectors are distributed across separate process groups. An analysis in terms of the communication metric predicts that scalability can be restored if, and only if, one implements the two orthogonal layers of parallelism via different distributed vector layouts. Our theoretical analysis is corroborated by benchmarks for application matrices from quantum and solid state physics, road networks, and nonlinear programming. We finally demonstrate the benefits of using orthogonal layers of parallelism with two exemplary application cases—an exciton and a strongly correlated electron system—which incur either small or large communication overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Parallel Computing COMPUTER SCIENCE, THEORY & METHODS-

CiteScore

4.10

自引率

0.00%

发文量