Task-parallel in situ temporal compression of large-scale computational fluid dynamics data

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

International Journal of High Performance Computing Applications Pub Date : 2021-03-02 DOI:10.1177/10943420221085000

Heather Pacella, Alec M. Dunton, A. Doostan, G. Iaccarino

{"title":"Task-parallel in situ temporal compression of large-scale computational fluid dynamics data","authors":"Heather Pacella, Alec M. Dunton, A. Doostan, G. Iaccarino","doi":"10.1177/10943420221085000","DOIUrl":null,"url":null,"abstract":"Present day computational fluid dynamics (CFD) simulations generate considerable amounts of data, sometimes on the order of TB/s. Often, a significant fraction of this data is discarded because current storage systems are unable to keep pace. To address this, data compression algorithms can be applied to data arrays containing flow quantities of interest (QoIs) to reduce the overall required storage. The matrix column interpolative decomposition (ID) can be implemented as a type of lossy compression for data matrices that factors the original data matrix into a product of two smaller factor matrices. One of these matrices consists of a subset of the columns of the original data matrix, while the other is a coefficient matrix which approximates the original data matrix columns as linear combinations of the selected columns. Motivating this work is the observation that the structure of ID algorithms makes them well suited for the asynchronous nature of task-based parallelism; they can operate independently on subdomains of the system of interest and, as a result, provide varied levels of compression. Using the task-based Legion programming model, a single-pass ID algorithm (SPID) for CFD applications is implemented. Performance studies, scalability, and the accuracy of the compression algorithm are presented for a benchmark analytical Taylor-Green vortex problem, as well as large-scale implementations of both low and high Reynolds number (Re) compressible Taylor-Green vortices using a high-order Navier-Stokes solver. In the case of the analytical solution, the resulting compressed solution was rank-one, with error on the order of machine precision. For the low-Re vortex, compression factors between 1000 and 10,000 were achieved for errors in the range 10−2–10−3. Similar error values were seen for the high-Re vortex, this time with compression factors between 100 and 1000. Moreover, strong and weak scaling results demonstrate that introducing SPID to solvers leads to negligible increases in runtime.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"36 1","pages":"388 - 418"},"PeriodicalIF":3.5000,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of High Performance Computing Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/10943420221085000","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 7

Abstract

Present day computational fluid dynamics (CFD) simulations generate considerable amounts of data, sometimes on the order of TB/s. Often, a significant fraction of this data is discarded because current storage systems are unable to keep pace. To address this, data compression algorithms can be applied to data arrays containing flow quantities of interest (QoIs) to reduce the overall required storage. The matrix column interpolative decomposition (ID) can be implemented as a type of lossy compression for data matrices that factors the original data matrix into a product of two smaller factor matrices. One of these matrices consists of a subset of the columns of the original data matrix, while the other is a coefficient matrix which approximates the original data matrix columns as linear combinations of the selected columns. Motivating this work is the observation that the structure of ID algorithms makes them well suited for the asynchronous nature of task-based parallelism; they can operate independently on subdomains of the system of interest and, as a result, provide varied levels of compression. Using the task-based Legion programming model, a single-pass ID algorithm (SPID) for CFD applications is implemented. Performance studies, scalability, and the accuracy of the compression algorithm are presented for a benchmark analytical Taylor-Green vortex problem, as well as large-scale implementations of both low and high Reynolds number (Re) compressible Taylor-Green vortices using a high-order Navier-Stokes solver. In the case of the analytical solution, the resulting compressed solution was rank-one, with error on the order of machine precision. For the low-Re vortex, compression factors between 1000 and 10,000 were achieved for errors in the range 10−2–10−3. Similar error values were seen for the high-Re vortex, this time with compression factors between 100 and 1000. Moreover, strong and weak scaling results demonstrate that introducing SPID to solvers leads to negligible increases in runtime.

查看原文本刊更多论文

大规模计算流体动力学数据的任务并行原位时间压缩

目前的计算流体动力学(CFD)模拟产生大量数据，有时达到TB/s的数量级。通常，这些数据的很大一部分被丢弃，因为当前的存储系统无法跟上速度。为了解决这个问题，可以将数据压缩算法应用于包含感兴趣流量(qos)的数据阵列，以减少所需的总体存储空间。矩阵列插值分解(ID)可以实现为数据矩阵的一种有损压缩，将原始数据矩阵分解为两个较小的因子矩阵的乘积。其中一个矩阵由原始数据矩阵列的子集组成，而另一个是系数矩阵，它将原始数据矩阵列近似为所选列的线性组合。这项工作的动机是观察到ID算法的结构使它们非常适合基于任务的并行的异步性质;它们可以在感兴趣的系统的子域上独立操作，因此可以提供不同级别的压缩。利用基于任务的军团编程模型，实现了CFD应用的单遍ID算法(SPID)。性能研究，可扩展性和压缩算法的准确性提出了一个基准分析泰勒-格林涡旋问题，以及大规模实现低和高雷诺数(Re)可压缩泰勒-格林涡旋使用高阶Navier-Stokes解算器。在解析解的情况下，得到的压缩解为一级，误差在机器精度的数量级上。对于低re涡旋，误差在10−2-10−3范围内，压缩系数在1000 ~ 10000之间。类似的误差值出现在高re涡旋上，这一次压缩系数在100到1000之间。此外，强缩放和弱缩放结果表明，在求解器中引入SPID导致的运行时增加可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of High Performance Computing Applications 工程技术-计算机：跨学科应用

CiteScore

6.10

自引率

6.50%

发文量

审稿时长

>12 weeks

期刊介绍： With ever increasing pressure for health services in all countries to meet rising demands, improve their quality and efficiency, and to be more accountable; the need for rigorous research and policy analysis has never been greater. The Journal of Health Services Research & Policy presents the latest scientific research, insightful overviews and reflections on underlying issues, and innovative, thought provoking contributions from leading academics and policy-makers. It provides ideas and hope for solving dilemmas that confront all countries.