Engineering Compressed Static Functions

2018 Data Compression Conference Pub Date : 2018-03-27 DOI:10.1109/DCC.2018.00013

M. Genuzio, S. Vigna

{"title":"Engineering Compressed Static Functions","authors":"M. Genuzio, S. Vigna","doi":"10.1109/DCC.2018.00013","DOIUrl":null,"url":null,"abstract":"Recent advances in the compact representation of static functions (with constant access time) have made it possible to fully exploit constructions based on random linear system. Such constructions, albeit theoretically appealing, were previously too slow to be usable. In this paper, we extend such techniques to the problem of storing compressed static functions, in the sense that the space used per key should be close to the entropy of the list of values. From a theoretical viewpoint, we are inspired by the approach of Hreinsson, Krøyer and Pagh. Values are represented using a near-optimal instantaneous code. Then, a bit array is created so that by XOR’ing its content at a fixed number of positions depending on the key one obtains the value, represented by its associated codeword. In the construction phase, every bit of the array is associated with an equation on Z/2Z, and solving the associated system provides the desired representation. Thus, we pass from one equation per key (the non-compressed case) to one equation per bit: the size of the system is thus approximately multiplied by the empirical entropy of the values, making the problem much more challenging. We show that by carefully engineering the value representation we can obtain a practical data structure. For example, we can store a function with geometrically distributed output in just 2.28 bits per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires ≈ log log n bits per key, where n is the number of keys, and slightly improved lookup time. We can also store a function with an output of 106 values distributed following a power law of exponent 2 in just 2.75 bits per key, whereas a non-compressed function would require more than 20, with a threefold increase in construction time and significantly faster lookups.","PeriodicalId":137206,"journal":{"name":"2018 Data Compression Conference","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2018.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Recent advances in the compact representation of static functions (with constant access time) have made it possible to fully exploit constructions based on random linear system. Such constructions, albeit theoretically appealing, were previously too slow to be usable. In this paper, we extend such techniques to the problem of storing compressed static functions, in the sense that the space used per key should be close to the entropy of the list of values. From a theoretical viewpoint, we are inspired by the approach of Hreinsson, Krøyer and Pagh. Values are represented using a near-optimal instantaneous code. Then, a bit array is created so that by XOR’ing its content at a fixed number of positions depending on the key one obtains the value, represented by its associated codeword. In the construction phase, every bit of the array is associated with an equation on Z/2Z, and solving the associated system provides the desired representation. Thus, we pass from one equation per key (the non-compressed case) to one equation per bit: the size of the system is thus approximately multiplied by the empirical entropy of the values, making the problem much more challenging. We show that by carefully engineering the value representation we can obtain a practical data structure. For example, we can store a function with geometrically distributed output in just 2.28 bits per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires ≈ log log n bits per key, where n is the number of keys, and slightly improved lookup time. We can also store a function with an output of 106 values distributed following a power law of exponent 2 in just 2.75 bits per key, whereas a non-compressed function would require more than 20, with a threefold increase in construction time and significantly faster lookups.

查看原文本刊更多论文

工程压缩静态函数

静态函数(具有恒定访问时间)的紧凑表示的最新进展使得充分利用基于随机线性系统的结构成为可能。这种结构虽然在理论上很有吸引力，但在此之前速度太慢，无法使用。在本文中，我们将这种技术扩展到存储压缩静态函数的问题，在这个意义上，每个键使用的空间应该接近值列表的熵。从理论的角度来看，我们受到了Hreinsson, kø yer和Pagh的方法的启发。值是用近乎最优的瞬时代码表示的。然后，创建一个位数组，以便根据键在固定数量的位置上对其内容进行异或，从而获得由其相关码字表示的值。在构造阶段，数组的每个位都与Z/2Z上的方程相关联，求解相关系统提供所需的表示。因此，我们从每个密钥(非压缩情况)传递一个方程到每个比特传递一个方程:因此，系统的大小近似地乘以值的经验熵，使问题更具挑战性。我们表明，通过仔细设计值表示，我们可以获得实用的数据结构。例如，我们可以存储一个具有几何分布输出的函数，每个密钥仅为2.28位，与密钥集无关，构建时间是最先进的非压缩函数的两倍，每个密钥需要≈log log n位，其中n是密钥的数量，并且查找时间略有改善。我们还可以存储一个函数，其输出值为106个，按照指数2的幂律分布，每个密钥只需2.75位，而非压缩函数则需要超过20位，构建时间增加了三倍，查找速度也明显加快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Data Compression Conference

自引率

0.00%

发文量