Qi Hu, N. Gumerov, Rio Yokota, L. Barba, R. Duraiswami
{"title":"Scalable Fast Multipole Accelerated Vortex Methods","authors":"Qi Hu, N. Gumerov, Rio Yokota, L. Barba, R. Duraiswami","doi":"10.1109/IPDPSW.2014.110","DOIUrl":null,"url":null,"abstract":"The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels -- the Biot-Savart equation and stretching term of the vorticity equation, we mathematically reformulated it so that only two Laplace scalar potentials are used instead of six. This automatically ensuring divergence-free far-field computation. Based on this formulation, we developed a new FMM-based vortex method on heterogeneous architectures, which distributed the work between multicore CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm uses new data structures which can dynamically manage inter-node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching calculation for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels -- the Biot-Savart equation and stretching term of the vorticity equation, we mathematically reformulated it so that only two Laplace scalar potentials are used instead of six. This automatically ensuring divergence-free far-field computation. Based on this formulation, we developed a new FMM-based vortex method on heterogeneous architectures, which distributed the work between multicore CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm uses new data structures which can dynamically manage inter-node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching calculation for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.