Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets

Q3 Computer Science
Zhiwei Ai, Juelin Leng, Fang Xia, Huawei Wang, Yi Cao
{"title":"Error-Controlled Data Reduction Approach for Large-Scale Structured Datasets","authors":"Zhiwei Ai, Juelin Leng, Fang Xia, Huawei Wang, Yi Cao","doi":"10.3724/sp.j.1089.2021.19263","DOIUrl":null,"url":null,"abstract":"The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even petabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is 1796 计算机辅助设计与图形学学报 第 33 卷 reduced by 99.8%, with the relative error less than 10%. The peak signal-tonoise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization.","PeriodicalId":52442,"journal":{"name":"计算机辅助设计与图形学学报","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"计算机辅助设计与图形学学报","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.3724/sp.j.1089.2021.19263","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

The massive datasets generated by scientific or engineering simulations have reached terabytes (TB) or even petabytes (PB). Data reduction has thus become one of the most important tools for saving I/O and storage costs. In order to achieve high-precision visualization and analysis, an error-controlled data reduction approach is proposed for reducing structured large-scale datasets. Firstly, taken the difference between the resulting data and the original one as a constraint, a multi-level structured adaptively-refined background grid is constructed, according to the spatial distribution characteristics of the underlying physical fields. Secondly, the original data is interpolated and mapped to the background grid, and as a result, the data with much less cells is obtained and the storage cost is reduced. Finally, the reduced data is exported to the parallel file system in real time. The proposed data reduction algorithm is implemented based on the parallel programming framework named JASMIN. In this way, the algorithm can be directly coupled with the numerical simulation programs developed with JASMIN. Test results demonstrate that the parallel algorithm can be extended to tens of thousands of CPU cores in parallel. The proposed algorithm has been successfully applied to the electromagnetic simulation of unmanned aerial vehicle irradiation. The cell number of a structured dataset with one hundred billions cells is 1796 计算机辅助设计与图形学学报 第 33 卷 reduced by 99.8%, with the relative error less than 10%. The peak signal-tonoise ratio between the two images, rendered using the reduced data and the original one respectively, is equal to 47.08 dB, which means a high similarity and thus satisfies the precision requirement of visualization.
大规模结构化数据集的误差控制数据约简方法
由科学或工程模拟产生的海量数据集已经达到TB甚至PB。因此,数据缩减已成为节省I/O和存储成本的最重要工具之一。为了实现高精度的可视化和分析,提出了一种误差控制的数据约简方法。首先,以结果数据与原始数据的差异为约束,根据底层物理场的空间分布特征,构建多层次结构的自适应细化背景网格;其次,对原始数据进行插值并映射到背景网格中,得到的数据单元数大大减少,降低了存储成本;最后,将简化后的数据实时导出到并行文件系统中。提出的数据约简算法是基于并行编程框架JASMIN实现的。这样,该算法可以直接与JASMIN开发的数值模拟程序耦合。测试结果表明,该算法可以扩展到数万个CPU核并行运行。该算法已成功应用于无人机辐射电磁仿真中。1000亿个单元格的结构化数据集的单元格数为1796,减少了99.8%,相对误差小于10%。分别用降维数据和原始数据绘制的两幅图像的峰值信噪比为47.08 dB,相似度较高,满足可视化的精度要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
计算机辅助设计与图形学学报
计算机辅助设计与图形学学报 Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
1.20
自引率
0.00%
发文量
6833
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信