{"title":"Low-Complexity Compression with Random Access","authors":"Srikanth Kamparaju, Shaik Mastan, Shashank Vatedka","doi":"10.1109/SPCOM55316.2022.9840790","DOIUrl":null,"url":null,"abstract":"We investigate the problem of variable-length compression with random access for stationary and ergodic sources, wherein short substrings of the raw file can be extracted from the compressed file without decompressing the entire file. It is possible to design compressors for sequences of length n that achieve compression rates close to the entropy rate of the source, and still be able to extract individual source symbols in time $\\theta(1)$ under the word-RAM model. In this article, we analyze a simple well-known approach used for compression with random access. We theoretically show that this is suboptimal, and design two simple compressors that simultaneously achieve entropy rate and constant-time random access. We then propose dictionary compression as a means to further improve performance, and experimentally validate this on various datasets.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We investigate the problem of variable-length compression with random access for stationary and ergodic sources, wherein short substrings of the raw file can be extracted from the compressed file without decompressing the entire file. It is possible to design compressors for sequences of length n that achieve compression rates close to the entropy rate of the source, and still be able to extract individual source symbols in time $\theta(1)$ under the word-RAM model. In this article, we analyze a simple well-known approach used for compression with random access. We theoretically show that this is suboptimal, and design two simple compressors that simultaneously achieve entropy rate and constant-time random access. We then propose dictionary compression as a means to further improve performance, and experimentally validate this on various datasets.