{"title":"松弛奇迹:地震建模半解析傅立叶域求解器的GPU并行化","authors":"S. Masuti, S. Barbot, Nachiket Kapre","doi":"10.1109/HIPC.2014.7116901","DOIUrl":null,"url":null,"abstract":"Effective utilization of GPU processing capacity for scientific workloads is often limited by memory throughput and PCIe communication transfer times. This is particularly true for semi-analytic Fourier-domain computations in earthquake modeling (Relax) where operations on large-scale 3D data structures can require moving large volumes of data from storage to the compute in predictable but orthogonal access patterns. We show how to transform the computation to avoid PCIe transfers entirely by reconstructing the 3D data structures directly within the GPU global memory. We also consider arithmetic transformations that replace some communication-intensive 1D FFTs with simpler, data-parallel analytical solutions. Using our approach we are able to reduce computation times for a geophysical model of the 2012 Mw8.7 Wharton Basin earthquake from 2 hours down to 15 minutes (speedup of ≈8x) for grid sizes of 512-512-256 when comparing NVIDIA K20 with a 16-threaded Intel Xeon E5-2670 CPU (supported by Intel-MKL libraries). Our GPU-accelerated solution (called Relax-Miracle) also makes it possible to conduct Markov-Chain Monte-Carlo simulations using more than 1000 time-dependent models on 12 GPUs per single day of calculation, enhancing our ability to use such techniques for time-consuming data inversion and Bayesian inversion experiments.","PeriodicalId":337777,"journal":{"name":"2014 21st International Conference on High Performance Computing (HiPC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Relax-Miracle: GPU parallelization of semi-analytic fourier-domain solvers for earthquake modeling\",\"authors\":\"S. Masuti, S. Barbot, Nachiket Kapre\",\"doi\":\"10.1109/HIPC.2014.7116901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Effective utilization of GPU processing capacity for scientific workloads is often limited by memory throughput and PCIe communication transfer times. This is particularly true for semi-analytic Fourier-domain computations in earthquake modeling (Relax) where operations on large-scale 3D data structures can require moving large volumes of data from storage to the compute in predictable but orthogonal access patterns. We show how to transform the computation to avoid PCIe transfers entirely by reconstructing the 3D data structures directly within the GPU global memory. We also consider arithmetic transformations that replace some communication-intensive 1D FFTs with simpler, data-parallel analytical solutions. Using our approach we are able to reduce computation times for a geophysical model of the 2012 Mw8.7 Wharton Basin earthquake from 2 hours down to 15 minutes (speedup of ≈8x) for grid sizes of 512-512-256 when comparing NVIDIA K20 with a 16-threaded Intel Xeon E5-2670 CPU (supported by Intel-MKL libraries). Our GPU-accelerated solution (called Relax-Miracle) also makes it possible to conduct Markov-Chain Monte-Carlo simulations using more than 1000 time-dependent models on 12 GPUs per single day of calculation, enhancing our ability to use such techniques for time-consuming data inversion and Bayesian inversion experiments.\",\"PeriodicalId\":337777,\"journal\":{\"name\":\"2014 21st International Conference on High Performance Computing (HiPC)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 21st International Conference on High Performance Computing (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HIPC.2014.7116901\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 21st International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIPC.2014.7116901","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Relax-Miracle: GPU parallelization of semi-analytic fourier-domain solvers for earthquake modeling
Effective utilization of GPU processing capacity for scientific workloads is often limited by memory throughput and PCIe communication transfer times. This is particularly true for semi-analytic Fourier-domain computations in earthquake modeling (Relax) where operations on large-scale 3D data structures can require moving large volumes of data from storage to the compute in predictable but orthogonal access patterns. We show how to transform the computation to avoid PCIe transfers entirely by reconstructing the 3D data structures directly within the GPU global memory. We also consider arithmetic transformations that replace some communication-intensive 1D FFTs with simpler, data-parallel analytical solutions. Using our approach we are able to reduce computation times for a geophysical model of the 2012 Mw8.7 Wharton Basin earthquake from 2 hours down to 15 minutes (speedup of ≈8x) for grid sizes of 512-512-256 when comparing NVIDIA K20 with a 16-threaded Intel Xeon E5-2670 CPU (supported by Intel-MKL libraries). Our GPU-accelerated solution (called Relax-Miracle) also makes it possible to conduct Markov-Chain Monte-Carlo simulations using more than 1000 time-dependent models on 12 GPUs per single day of calculation, enhancing our ability to use such techniques for time-consuming data inversion and Bayesian inversion experiments.