{"title":"Engineering Compressed Static Functions","authors":"M. Genuzio, S. Vigna","doi":"10.1109/DCC.2018.00013","DOIUrl":null,"url":null,"abstract":"Recent advances in the compact representation of static functions (with constant access time) have made it possible to fully exploit constructions based on random linear system. Such constructions, albeit theoretically appealing, were previously too slow to be usable. In this paper, we extend such techniques to the problem of storing compressed static functions, in the sense that the space used per key should be close to the entropy of the list of values. From a theoretical viewpoint, we are inspired by the approach of Hreinsson, Krøyer and Pagh. Values are represented using a near-optimal instantaneous code. Then, a bit array is created so that by XOR’ing its content at a fixed number of positions depending on the key one obtains the value, represented by its associated codeword. In the construction phase, every bit of the array is associated with an equation on Z/2Z, and solving the associated system provides the desired representation. Thus, we pass from one equation per key (the non-compressed case) to one equation per bit: the size of the system is thus approximately multiplied by the empirical entropy of the values, making the problem much more challenging. We show that by carefully engineering the value representation we can obtain a practical data structure. For example, we can store a function with geometrically distributed output in just 2.28 bits per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires ≈ log log n bits per key, where n is the number of keys, and slightly improved lookup time. We can also store a function with an output of 106 values distributed following a power law of exponent 2 in just 2.75 bits per key, whereas a non-compressed function would require more than 20, with a threefold increase in construction time and significantly faster lookups.","PeriodicalId":137206,"journal":{"name":"2018 Data Compression Conference","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2018.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Recent advances in the compact representation of static functions (with constant access time) have made it possible to fully exploit constructions based on random linear system. Such constructions, albeit theoretically appealing, were previously too slow to be usable. In this paper, we extend such techniques to the problem of storing compressed static functions, in the sense that the space used per key should be close to the entropy of the list of values. From a theoretical viewpoint, we are inspired by the approach of Hreinsson, Krøyer and Pagh. Values are represented using a near-optimal instantaneous code. Then, a bit array is created so that by XOR’ing its content at a fixed number of positions depending on the key one obtains the value, represented by its associated codeword. In the construction phase, every bit of the array is associated with an equation on Z/2Z, and solving the associated system provides the desired representation. Thus, we pass from one equation per key (the non-compressed case) to one equation per bit: the size of the system is thus approximately multiplied by the empirical entropy of the values, making the problem much more challenging. We show that by carefully engineering the value representation we can obtain a practical data structure. For example, we can store a function with geometrically distributed output in just 2.28 bits per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires ≈ log log n bits per key, where n is the number of keys, and slightly improved lookup time. We can also store a function with an output of 106 values distributed following a power law of exponent 2 in just 2.75 bits per key, whereas a non-compressed function would require more than 20, with a threefold increase in construction time and significantly faster lookups.