Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets

Q3 Computer Science

计算机辅助设计与图形学学报 Pub Date : 2021-12-01 DOI:10.3724/sp.j.1089.2021.19263

Zhiwei Ai, Juelin Leng, Fang Xia, Huawei Wang, Yi Cao

{"title":"Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets","authors":"Zhiwei Ai, Juelin Leng, Fang Xia, Huawei Wang, Yi Cao","doi":"10.3724/sp.j.1089.2021.19263","DOIUrl":null,"url":null,"abstract":"The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even petabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is 1796 计算机辅助设计与图形学学报第 33 卷 reduced by 99.8%, with the relative error less than 10%. The peak signal-tonoise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization.","PeriodicalId":52442,"journal":{"name":"计算机辅助设计与图形学学报","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"计算机辅助设计与图形学学报","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.3724/sp.j.1089.2021.19263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even petabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is 1796 计算机辅助设计与图形学学报第 33 卷 reduced by 99.8%, with the relative error less than 10%. The peak signal-tonoise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization.

查看原文本刊更多论文

大规模结构化数据集的误差控制数据约简方法

由科学或工程模拟产生的海量数据集已经达到TB甚至PB。因此，数据缩减已成为节省I/O和存储成本的最重要工具之一。为了实现高精度的可视化和分析，提出了一种误差控制的数据约简方法。首先，以结果数据与原始数据的差异为约束，根据底层物理场的空间分布特征，构建多层次结构的自适应细化背景网格;其次，对原始数据进行插值并映射到背景网格中，得到的数据单元数大大减少，降低了存储成本;最后，将简化后的数据实时导出到并行文件系统中。提出的数据约简算法是基于并行编程框架JASMIN实现的。这样，该算法可以直接与JASMIN开发的数值模拟程序耦合。测试结果表明，该算法可以扩展到数万个CPU核并行运行。该算法已成功应用于无人机辐射电磁仿真中。1000亿个单元格的结构化数据集的单元格数为1796，减少了99.8%，相对误差小于10%。分别用降维数据和原始数据绘制的两幅图像的峰值信噪比为47.08 dB，相似度较高，满足可视化的精度要求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊