Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms

2019 35th Symposium on Mass Storage Systems and Technologies (MSST) Pub Date : 2019-05-20 DOI:10.1109/MSST.2019.00-15

Xiangyu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang, S. Di, Dingwen Tao, F. Cappello

{"title":"Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms","authors":"Xiangyu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang, S. Di, Dingwen Tao, F. Cappello","doi":"10.1109/MSST.2019.00-15","DOIUrl":null,"url":null,"abstract":"Scientific simulations in high-performance computing (HPC) environments are producing vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for post-analysis. Unlike the traditional data reduction schemes (such as deduplication or lossless compression), not only can error-controlled lossy compression significantly reduce the data size but it can also hold the promise to satisfy user demand on error control. Point-wise relative error bounds (i.e., compression errors depends on the data values) are widely used by many scientific applications in the lossy compression, since error control can adapt to the precision in the dataset automatically. Point-wise relative error bounded compression is complicated and time consuming. In this work, we develop efficient precomputation-based mechanisms in the SZ lossy compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error bounded compression with excellent compression ratios. In addition, our mechanisms also help reduce traversing operations for Huffman decoding, and thus significantly accelerate the decompression process in SZ. Experiments with four well-known real-world scientific simulation datasets show that our solution can improve the compression rate by about 30% and decompression rate by about 70% in most of cases, making our designed lossy compression strategy the best choice in class in most cases.","PeriodicalId":391517,"journal":{"name":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","volume":"48 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 35th Symposium on Mass Storage Systems and Technologies (MSST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSST.2019.00-15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Scientific simulations in high-performance computing (HPC) environments are producing vast volume of data, which may cause a severe I/O bottleneck at runtime and a huge burden on storage space for post-analysis. Unlike the traditional data reduction schemes (such as deduplication or lossless compression), not only can error-controlled lossy compression significantly reduce the data size but it can also hold the promise to satisfy user demand on error control. Point-wise relative error bounds (i.e., compression errors depends on the data values) are widely used by many scientific applications in the lossy compression, since error control can adapt to the precision in the dataset automatically. Point-wise relative error bounded compression is complicated and time consuming. In this work, we develop efficient precomputation-based mechanisms in the SZ lossy compression framework. Our mechanisms can avoid costly logarithmic transformation and identify quantization factor values via a fast table lookup, greatly accelerating the relative-error bounded compression with excellent compression ratios. In addition, our mechanisms also help reduce traversing operations for Huffman decoding, and thus significantly accelerate the decompression process in SZ. Experiments with four well-known real-world scientific simulation datasets show that our solution can improve the compression rate by about 30% and decompression rate by about 70% in most of cases, making our designed lossy compression strategy the best choice in class in most cases.

查看原文本刊更多论文

基于预计算机制加速HPC数据集的相对误差有界有损压缩

高性能计算(HPC)环境中的科学模拟会产生大量数据，这可能会在运行时造成严重的I/O瓶颈，并给后期分析带来巨大的存储空间负担。与传统的数据缩减方案(如重复数据删除或无损压缩)不同，错误控制的有损压缩不仅可以显著减少数据大小，而且可以保证满足用户对错误控制的需求。逐点相对误差边界(即压缩误差取决于数据值)在有损压缩中被广泛应用于许多科学应用，因为误差控制可以自动适应数据集的精度。逐点相对误差有界压缩是一种复杂且耗时的压缩方法。在这项工作中，我们在SZ有损压缩框架中开发了高效的基于预计算的机制。我们的机制可以避免代价高昂的对数变换，并通过快速查找表来识别量化因子值，从而大大加快了具有优异压缩比的相对误差有限压缩。此外，我们的机制还有助于减少霍夫曼解码的遍历操作，从而显著加快SZ中的解压缩过程。在四个著名的真实科学仿真数据集上的实验表明，在大多数情况下，我们的解决方案可以将压缩率提高约30%，解压率提高约70%，使我们设计的有损压缩策略在大多数情况下是类中的最佳选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 35th Symposium on Mass Storage Systems and Technologies (MSST)

自引率

0.00%

发文量