International Journal of High Performance Computing Applications最新文献_第10页

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic 利用混合精度算术的数值线性代数方法综述

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2021-03-19 DOI: 10.1177/10943420211003313

A. Abdelfattah, H. Anzt, E. Boman, E. Carson, T. Cojean, J. Dongarra, Alyson Fox, M. Gates, N. Higham, X. Li, J. Loe, P. Luszczek, S. Pranesh, S. Rajamanickam, T. Ribizel, Barry Smith, K. Swirydowicz, Stephen J. Thomas, S. Tomov, Y. Tsai, U. Yang

引用次数: 57

Data-driven global weather predictions at high resolutions 数据驱动的高分辨率全球天气预报

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2021-03-09 DOI: 10.1177/10943420211039818

John Taylor, P. Larraondo, B. D. de Supinski

引用次数: 7

Accelerated execution via eager-release of dependencies in task-based workflows 通过在基于任务的工作流中快速释放依赖来加速执行

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2021-03-03 DOI: 10.1177/1094342021997558

Hatem Elshazly, F. Lordan, J. Ejarque, R. Badia

{"title":"Accelerated execution via eager-release of dependencies in task-based workflows","authors":"Hatem Elshazly, F. Lordan, J. Ejarque, R. Badia","doi":"10.1177/1094342021997558","DOIUrl":"https://doi.org/10.1177/1094342021997558","url":null,"abstract":"Task-based programming models offer a flexible way to express the unstructured parallelism patterns of nowadays complex applications. This expressive capability is required to achieve maximum possible performance for applications that are executed in distributed execution platforms. In current task-based workflows, tasks are launched for execution when their data dependencies are satisfied. However, even though the data dependencies of a certain task might have been already produced, the execution of this task will be delayed until its predecessor tasks completely finish their execution. As a consequence of this approach of releasing dependencies, the amount of parallelism inherent in applications is limited and performance improvement opportunities are wasted. To mitigate this limitation, we propose an eager approach for releasing data dependencies. Following this approach, the execution of tasks will not be delayed until their predecessor tasks completely finish their execution, instead, tasks will be launched for execution as soon as their data requirements are available. Hence, more parallelism is exposed and applications can achieve higher levels of performance by overlapping the execution of tasks. Towards achieving this goal, in this paper we propose applying two changes to task-based workflow systems. First, modifying the dependency relationships of tasks to be specified not only in terms of predecessor and successor tasks but also in terms of the data that caused these dependencies. Second, triggering the release of dependencies as soon as a predecessor task generates the output data instead of having to wait until the end of the predecessor execution to release all of its dependencies. We realize this proposal using PyCOMPSs: a task-based programming model for parallelizing Python applications. Our experiments show that using an eager approach for releasing dependencies achieves more than 50% performance improvement in the total execution time as compared to the default approach of releasing dependencies.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"325 - 343"},"PeriodicalIF":3.1,"publicationDate":"2021-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1094342021997558","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44714622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Task-parallel in situ temporal compression of large-scale computational fluid dynamics data 大规模计算流体动力学数据的任务并行原位时间压缩

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2021-03-02 DOI: 10.1177/10943420221085000

Heather Pacella, Alec M. Dunton, A. Doostan, G. Iaccarino

{"title":"Task-parallel in situ temporal compression of large-scale computational fluid dynamics data","authors":"Heather Pacella, Alec M. Dunton, A. Doostan, G. Iaccarino","doi":"10.1177/10943420221085000","DOIUrl":"https://doi.org/10.1177/10943420221085000","url":null,"abstract":"Present day computational fluid dynamics (CFD) simulations generate considerable amounts of data, sometimes on the order of TB/s. Often, a significant fraction of this data is discarded because current storage systems are unable to keep pace. To address this, data compression algorithms can be applied to data arrays containing flow quantities of interest (QoIs) to reduce the overall required storage. The matrix column interpolative decomposition (ID) can be implemented as a type of lossy compression for data matrices that factors the original data matrix into a product of two smaller factor matrices. One of these matrices consists of a subset of the columns of the original data matrix, while the other is a coefficient matrix which approximates the original data matrix columns as linear combinations of the selected columns. Motivating this work is the observation that the structure of ID algorithms makes them well suited for the asynchronous nature of task-based parallelism; they can operate independently on subdomains of the system of interest and, as a result, provide varied levels of compression. Using the task-based Legion programming model, a single-pass ID algorithm (SPID) for CFD applications is implemented. Performance studies, scalability, and the accuracy of the compression algorithm are presented for a benchmark analytical Taylor-Green vortex problem, as well as large-scale implementations of both low and high Reynolds number (Re) compressible Taylor-Green vortices using a high-order Navier-Stokes solver. In the case of the analytical solution, the resulting compressed solution was rank-one, with error on the order of machine precision. For the low-Re vortex, compression factors between 1000 and 10,000 were achieved for errors in the range 10−2–10−3. Similar error values were seen for the high-Re vortex, this time with compression factors between 100 and 1000. Moreover, strong and weak scaling results demonstrate that introducing SPID to solvers leads to negligible increases in runtime.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"36 1","pages":"388 - 418"},"PeriodicalIF":3.1,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43904224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction 数值天气和气候预测的高性能计算弹性和容错

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2021-02-08 DOI: 10.1177/1094342021990433

Tommaso Benacchio, Luca Bonaventura, Mirco Altenbernd, C. Cantwell, P. Düben, M. Gillard, L. Giraud, Dominik Göddeke, E. Raffin, K. Teranishi, N. Wedi

{"title":"Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction","authors":"Tommaso Benacchio, Luca Bonaventura, Mirco Altenbernd, C. Cantwell, P. Düben, M. Gillard, L. Giraud, Dominik Göddeke, E. Raffin, K. Teranishi, N. Wedi","doi":"10.1177/1094342021990433","DOIUrl":"https://doi.org/10.1177/1094342021990433","url":null,"abstract":"Progress in numerical weather and climate prediction accuracy greatly depends on the growth of the available computing power. As the number of cores in top computing facilities pushes into the millions, increased average frequency of hardware and software failures forces users to review their algorithms and systems in order to protect simulations from breakdown. This report surveys hardware, application-level and algorithm-level resilience approaches of particular relevance to time-critical numerical weather and climate prediction systems. A selection of applicable existing strategies is analysed, featuring interpolation-restart and compressed checkpointing for the numerical schemes, in-memory checkpointing, user-level failure mitigation and backup-based methods for the systems. Numerical examples showcase the performance of the techniques in addressing faults, with particular emphasis on iterative solvers for linear systems, a staple of atmospheric fluid flow solvers. The potential impact of these strategies is discussed in relation to current development of numerical weather prediction algorithms and systems towards the exascale. Trade-offs between performance, efficiency and effectiveness of resiliency strategies are analysed and some recommendations outlined for future developments.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"285 - 311"},"PeriodicalIF":3.1,"publicationDate":"2021-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1094342021990433","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42265908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

High Performance Computing: 36th International Conference, ISC High Performance 2021, Virtual Event, June 24 – July 2, 2021, Proceedings 高性能计算:第36届国际会议，ISC高性能2021，虚拟事件，2021年6月24日至7月2日，会议录

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2021-01-01 DOI: 10.1007/978-3-030-78713-4

引用次数: 0

High Performance Computing: 7th Latin American Conference, CARLA 2020, Cuenca, Ecuador, September 2–4, 2020, Revised Selected Papers 高性能计算:第七届拉丁美洲会议，卡拉2020，昆卡，厄瓜多尔，2020年9月2-4日，修订论文选集

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2021-01-01 DOI: 10.1007/978-3-030-68035-0

引用次数: 0

Point-block incomplete LU preconditioning with asynchronous iterations on GPU for multiphysics problems 多物理场问题的GPU异步迭代点块不完全LU预处理

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2020-12-28 DOI: 10.1177/1094342020981153

Wenpeng Ma, X. Cai

{"title":"Point-block incomplete LU preconditioning with asynchronous iterations on GPU for multiphysics problems","authors":"Wenpeng Ma, X. Cai","doi":"10.1177/1094342020981153","DOIUrl":"https://doi.org/10.1177/1094342020981153","url":null,"abstract":"Point-block matrices arise naturally in multiphysics problems when all variables associated with a mesh point are ordered together, and are different from the general block matrices since the sizes of the blocks are so small one can often invert some of the diagonal blocks explicitly. Motivated by the recent works of Chow and Patel and Chow et al., we propose an efficient incomplete LU (ILU) preconditioner for point-block matrices targeting applications on GPU. The construction of the preconditioner involves two critical steps: (1) the initial guessing of values for the lower and upper triangular matrices; and (2) several sweeps of asynchronous updating of the triangular matrices. Three representative problems are studied to show the advantage of the proposed point-block approach over the standard point-wise approach in terms of the number of GMRES iterations and also the total compute time. Moreover, we compare the proposed algorithm with the level-scheduling based parallel algorithm employed in NVIDIA’s cuSPARSE library as well as the serial method implemented in Intel MKL library, and the experiments show that a 2×–5× speedup can be achieved over the block-based ILU(p) factorizations from the cuSPARSE library.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"121 - 135"},"PeriodicalIF":3.1,"publicationDate":"2020-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/1094342020981153","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44174581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation 通过代码生成，在cpu和gpu上以高密度比率进行高效的晶格玻尔兹曼多相非混相流体模拟

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2020-12-11 DOI: 10.1177/10943420211016525

M. Holzer, Martin Bauer, H. Köstler, U. Rüde

{"title":"Highly efficient lattice Boltzmann multiphase simulations of immiscible fluids at high-density ratios on CPUs and GPUs through code generation","authors":"M. Holzer, Martin Bauer, H. Köstler, U. Rüde","doi":"10.1177/10943420211016525","DOIUrl":"https://doi.org/10.1177/10943420211016525","url":null,"abstract":"A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Meta-programming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations. The memory footprint of the resulting algorithm is reduced through the fusion of compute kernels. A roofline analysis demonstrates the excellent efficiency of the generated code on a single GPU. The resulting single GPU code has been integrated into the multiphysics framework waLBerla to run massively parallel simulations on large domains. Communication hiding and GPUDirect-enabled MPI yield near-perfect scaling behavior. Scaling experiments are conducted on the Piz Daint supercomputer with up to 2048 GPUs, simulating several hundred fully resolved bubbles. Further, validation of the implementation is shown in a physically relevant scenario—a three-dimensional rising air bubble in water.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"35 1","pages":"413 - 427"},"PeriodicalIF":3.1,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1177/10943420211016525","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43072628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A fine-grained parallelization of the immersed boundary method 一种细粒度的浸入边界并行化方法

IF 3.1 3区计算机科学

International Journal of High Performance Computing Applications Pub Date : 2020-12-11 DOI: 10.1177/10943420221083572

A. Kassen, Varun Shankar, A. Fogelson

引用次数: 0