Accelerating Parallel Hierarchical Matrix-Vector Products via Data-Driven Sampling

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI:10.1109/IPDPS47924.2020.00082

Lucas Erlandson, Difeng Cai, Yuanzhe Xi, Edmond Chow

{"title":"Accelerating Parallel Hierarchical Matrix-Vector Products via Data-Driven Sampling","authors":"Lucas Erlandson, Difeng Cai, Yuanzhe Xi, Edmond Chow","doi":"10.1109/IPDPS47924.2020.00082","DOIUrl":null,"url":null,"abstract":"Hierarchical matrices are scalable matrix representations particularly suited to the case where the matrix entries are defined by a smooth kernel function evaluated between pairs of points. In this paper, we present a new scheme to alleviate the computational bottlenecks present in many hierarchical matrix methods. For general kernel functions, a popular approach to construct hierarchical matrices is through interpolation, due to its efficiency compared to computationally expensive algebraic techniques. However, interpolation-based methods often lead to larger ranks, and do not scale well to higher dimensions. We propose a new data-driven method to resolve these issues. The new method is able to accomplish the rank reduction by using a surrogate for the global distribution of points. The surrogate is generated using a hierarchical data-driven sampling. As a result of the lower rank, the construction cost, memory requirements, and matrix-vector product costs decrease. Using state-of-theart dimension independent sampling, the new method makes it possible to tackle problems in higher dimensions. We also discuss an on-the-fly variation of hierarchical matrix construction and matrix-vector products that is able to reduce memory usage by an order of magnitude. This is accomplished by postponing the generation of certain intermediate matrices until they are used, generating them just in time. We provide results demonstrating the effectiveness of our improvements, both individually and in conjunction with each other. For a problem involving 320,000 points in 3D, our data-driven approach reduces the memory usage from 58.75 GiB using state-of-the-art methods (762.9 GiB if stored dense) to 18.60 GiB. In combination with our on-thefly approach, we are able to reduce the total memory usage to 543.74 MiB.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"47 1","pages":"749-758"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Hierarchical matrices are scalable matrix representations particularly suited to the case where the matrix entries are defined by a smooth kernel function evaluated between pairs of points. In this paper, we present a new scheme to alleviate the computational bottlenecks present in many hierarchical matrix methods. For general kernel functions, a popular approach to construct hierarchical matrices is through interpolation, due to its efficiency compared to computationally expensive algebraic techniques. However, interpolation-based methods often lead to larger ranks, and do not scale well to higher dimensions. We propose a new data-driven method to resolve these issues. The new method is able to accomplish the rank reduction by using a surrogate for the global distribution of points. The surrogate is generated using a hierarchical data-driven sampling. As a result of the lower rank, the construction cost, memory requirements, and matrix-vector product costs decrease. Using state-of-theart dimension independent sampling, the new method makes it possible to tackle problems in higher dimensions. We also discuss an on-the-fly variation of hierarchical matrix construction and matrix-vector products that is able to reduce memory usage by an order of magnitude. This is accomplished by postponing the generation of certain intermediate matrices until they are used, generating them just in time. We provide results demonstrating the effectiveness of our improvements, both individually and in conjunction with each other. For a problem involving 320,000 points in 3D, our data-driven approach reduces the memory usage from 58.75 GiB using state-of-the-art methods (762.9 GiB if stored dense) to 18.60 GiB. In combination with our on-thefly approach, we are able to reduce the total memory usage to 543.74 MiB.

查看原文本刊更多论文

基于数据驱动采样的并行分层矩阵向量积加速

层次矩阵是可伸缩的矩阵表示，特别适合于矩阵条目由在点对之间求值的光滑核函数定义的情况。在本文中，我们提出了一种新的方案来缓解许多层次矩阵方法中存在的计算瓶颈。对于一般的核函数，一种常用的构造层次矩阵的方法是通过内插，因为与计算昂贵的代数技术相比，它的效率更高。然而，基于插值的方法通常会导致更大的秩，并且不能很好地扩展到更高的维度。我们提出了一种新的数据驱动方法来解决这些问题。该方法通过对点的全局分布使用代理来实现秩降。代理是使用分层数据驱动的抽样生成的。由于排名较低，构建成本、内存需求和矩阵向量产品成本都降低了。利用最先进的维无关采样技术，新方法可以解决高维问题。我们还讨论了层次矩阵构造和矩阵向量乘积的动态变化，它能够将内存使用减少一个数量级。这是通过延迟某些中间矩阵的生成，直到它们被使用，及时生成它们来实现的。我们提供结果，证明我们的改进的有效性，无论是单独的还是相互结合的。对于涉及320,000个3D点的问题，我们的数据驱动方法使用最先进的方法将内存使用量从58.75 GiB(如果存储密集则为762.9 GiB)减少到18.60 GiB。结合我们的实时方法，我们能够将总内存使用量减少到543.74 MiB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量