Scalable Fast Multipole Accelerated Vortex Methods

2014 IEEE International Parallel & Distributed Processing Symposium Workshops Pub Date : 2014-05-19 DOI:10.1109/IPDPSW.2014.110

Qi Hu, N. Gumerov, Rio Yokota, L. Barba, R. Duraiswami

{"title":"Scalable Fast Multipole Accelerated Vortex Methods","authors":"Qi Hu, N. Gumerov, Rio Yokota, L. Barba, R. Duraiswami","doi":"10.1109/IPDPSW.2014.110","DOIUrl":null,"url":null,"abstract":"The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels -- the Biot-Savart equation and stretching term of the vorticity equation, we mathematically reformulated it so that only two Laplace scalar potentials are used instead of six. This automatically ensuring divergence-free far-field computation. Based on this formulation, we developed a new FMM-based vortex method on heterogeneous architectures, which distributed the work between multicore CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm uses new data structures which can dynamically manage inter-node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching calculation for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels -- the Biot-Savart equation and stretching term of the vorticity equation, we mathematically reformulated it so that only two Laplace scalar potentials are used instead of six. This automatically ensuring divergence-free far-field computation. Based on this formulation, we developed a new FMM-based vortex method on heterogeneous architectures, which distributed the work between multicore CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm uses new data structures which can dynamically manage inter-node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching calculation for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.

查看原文本刊更多论文

可扩展快速多极加速涡旋方法

在基于粒子的不可压缩流动模拟方法中，快速多极子法(FMM)常用于加速粒子相互作用的计算。为了评估最耗时的核——Biot-Savart方程和涡度方程的拉伸项，我们在数学上重新表述了它，以便只使用两个拉普拉斯标量势而不是六个。这自动确保无发散远场计算。在此基础上，我们开发了一种新的基于fmm的异构架构涡旋方法，该方法将工作分布在多核cpu和gpu之间，以最大限度地利用硬件资源并实现出色的可扩展性。该算法采用了新的数据结构，可以有效地动态管理节点间通信和负载平衡，并且并行构建开销很小。该算法可以扩展到大型集群，具有很强的可扩展性和较弱的可扩展性。对涡旋粒子法引起的截止函数进行了误差和时序权衡分析。我们的实现可以在55.9秒内完成32个节点上10亿个粒子的速度+拉伸计算的一个时间步，产生49.12 Tflop/s。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE International Parallel & Distributed Processing Symposium Workshops

自引率

0.00%

发文量