{"title":"Performance and accuracy analysis of nonlinear k-Wave simulations using local domain decomposition with an 8-GPU server","authors":"B. Treeby, Filip Vaverka, J. Jaros","doi":"10.1121/2.0000883","DOIUrl":null,"url":null,"abstract":"Large-scale nonlinear ultrasound simulations using the open-source k-Wave toolbox are now routinely performed using the MPI version of k-Wave running on traditional CPU-based clusters. However, the all-to-all communications required by the 3D fast Fourier transform (FFT) severely impact performance when scaling to large numbers of compute cores. This can be overcome by using a domain decomposition strategy based on a local Fourier basis. In this work, we analyze the performance and accuracy of using local domain decomposition for running a high-intensity focused ultrasound (HIFU) simulation in the kidney on a single server containing eight NVIDIA P40 graphical processing units (GPUs). Different decompositions and overlap sizes are investigated and compared to a global MPI simulation running on a CPU-based supercomputer using 1280 cores. For a grid size of 960 by 960 by 1280 grid points and an overlap size of 4 grid points, the error in the simulation using local domain decomposition is on the order of 0.1$ compared to the global simulation, which is sufficient for most applications. The financial cost for running the simulation is also reduced by more than an order of magnitude.Large-scale nonlinear ultrasound simulations using the open-source k-Wave toolbox are now routinely performed using the MPI version of k-Wave running on traditional CPU-based clusters. However, the all-to-all communications required by the 3D fast Fourier transform (FFT) severely impact performance when scaling to large numbers of compute cores. This can be overcome by using a domain decomposition strategy based on a local Fourier basis. In this work, we analyze the performance and accuracy of using local domain decomposition for running a high-intensity focused ultrasound (HIFU) simulation in the kidney on a single server containing eight NVIDIA P40 graphical processing units (GPUs). Different decompositions and overlap sizes are investigated and compared to a global MPI simulation running on a CPU-based supercomputer using 1280 cores. For a grid size of 960 by 960 by 1280 grid points and an overlap size of 4 grid points, the error in the simulation using local domain decomposition is on the order of 0.1...","PeriodicalId":20469,"journal":{"name":"Proc. Meet. Acoust.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. Meet. Acoust.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1121/2.0000883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Large-scale nonlinear ultrasound simulations using the open-source k-Wave toolbox are now routinely performed using the MPI version of k-Wave running on traditional CPU-based clusters. However, the all-to-all communications required by the 3D fast Fourier transform (FFT) severely impact performance when scaling to large numbers of compute cores. This can be overcome by using a domain decomposition strategy based on a local Fourier basis. In this work, we analyze the performance and accuracy of using local domain decomposition for running a high-intensity focused ultrasound (HIFU) simulation in the kidney on a single server containing eight NVIDIA P40 graphical processing units (GPUs). Different decompositions and overlap sizes are investigated and compared to a global MPI simulation running on a CPU-based supercomputer using 1280 cores. For a grid size of 960 by 960 by 1280 grid points and an overlap size of 4 grid points, the error in the simulation using local domain decomposition is on the order of 0.1$ compared to the global simulation, which is sufficient for most applications. The financial cost for running the simulation is also reduced by more than an order of magnitude.Large-scale nonlinear ultrasound simulations using the open-source k-Wave toolbox are now routinely performed using the MPI version of k-Wave running on traditional CPU-based clusters. However, the all-to-all communications required by the 3D fast Fourier transform (FFT) severely impact performance when scaling to large numbers of compute cores. This can be overcome by using a domain decomposition strategy based on a local Fourier basis. In this work, we analyze the performance and accuracy of using local domain decomposition for running a high-intensity focused ultrasound (HIFU) simulation in the kidney on a single server containing eight NVIDIA P40 graphical processing units (GPUs). Different decompositions and overlap sizes are investigated and compared to a global MPI simulation running on a CPU-based supercomputer using 1280 cores. For a grid size of 960 by 960 by 1280 grid points and an overlap size of 4 grid points, the error in the simulation using local domain decomposition is on the order of 0.1...