Data Encoding in Lossless Prediction-Based Compression Algorithms

2019 15th International Conference on eScience (eScience) Pub Date : 2019-09-01 DOI:10.1109/eScience.2019.00032

Ugur Çayoglu, Frank Tristram, Jörg Meyer, J. Schröter, T. Kerzenmacher, P. Braesicke, A. Streit

{"title":"Data Encoding in Lossless Prediction-Based Compression Algorithms","authors":"Ugur Çayoglu, Frank Tristram, Jörg Meyer, J. Schröter, T. Kerzenmacher, P. Braesicke, A. Streit","doi":"10.1109/eScience.2019.00032","DOIUrl":null,"url":null,"abstract":"The increase in compute power and development of sophisticated simulation models with higher resolution output triggers a need for compression algorithms for scientific data. Several compression algorithms are currently under development. Most of these algorithms are using prediction-based compression algorithms, where each value is predicted and the residual between the prediction and true value is saved on disk. Currently there are two established forms of residual calculation: Exclusive-or and numerical difference. In this paper we will summarize both techniques and show their strengths and weaknesses. We will show that shifting the prediction and true value to a binary number with certain properties results in a better compression factor with minimal additional computational costs. This gain in compression factor allows for the usage of less sophisticated prediction algorithms to achieve a higher throughput during compression and decompression. In addition, we will introduce a new encoding scheme to achieve an 9% increase in compression factor on average compared to the current state-of-the-art.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th International Conference on eScience (eScience)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2019.00032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The increase in compute power and development of sophisticated simulation models with higher resolution output triggers a need for compression algorithms for scientific data. Several compression algorithms are currently under development. Most of these algorithms are using prediction-based compression algorithms, where each value is predicted and the residual between the prediction and true value is saved on disk. Currently there are two established forms of residual calculation: Exclusive-or and numerical difference. In this paper we will summarize both techniques and show their strengths and weaknesses. We will show that shifting the prediction and true value to a binary number with certain properties results in a better compression factor with minimal additional computational costs. This gain in compression factor allows for the usage of less sophisticated prediction algorithms to achieve a higher throughput during compression and decompression. In addition, we will introduce a new encoding scheme to achieve an 9% increase in compression factor on average compared to the current state-of-the-art.

查看原文本刊更多论文

基于无损预测的压缩算法中的数据编码

计算能力的提高和具有更高分辨率输出的复杂仿真模型的发展引发了对科学数据压缩算法的需求。目前正在开发几种压缩算法。这些算法中的大多数都使用基于预测的压缩算法，其中每个值都是预测的，预测值与真实值之间的残差保存在磁盘上。目前已有两种确定的残差计算形式:异或和数值差分。在本文中，我们将总结这两种技术，并展示它们的优点和缺点。我们将展示，将预测值和真值转换为具有某些属性的二进制数会产生更好的压缩因子，并且额外的计算成本最小。压缩系数的增加允许使用不太复杂的预测算法来实现压缩和解压缩期间更高的吞吐量。此外，我们将引入一种新的编码方案，与目前最先进的编码方案相比，压缩系数平均提高9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 15th International Conference on eScience (eScience)

自引率

0.00%

发文量