An Upper Bound and Linear-Space Queries on the LZ-End Parsing.

4区农林科学 Q2 Agricultural and Biological Sciences

Soil Science Pub Date : 2022-01-01 DOI:10.1137/1.9781611977073.111

Dominik Kempa, Barna Saha

{"title":"An Upper Bound and Linear-Space Queries on the LZ-End Parsing.","authors":"Dominik Kempa, Barna Saha","doi":"10.1137/1.9781611977073.111","DOIUrl":null,"url":null,"abstract":"<p><p>Lempel-Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called \"phrases\"), every time forming as a phrase the longest prefix of the unprocessed part that has an earlier occurrence. In 2010, Kreft and Navarro introduced a variant of LZ77 called LZ-End, that additionally requires the previous occurrence of each phrase to end at the boundary of an already existing phrase. Due to its excellent practical performance as a compression algorithm and a compressed index, they conjectured that it achieves a compression that can be provably upper-bounded in terms of the LZ77 size. Despite the recent progress in understanding such relation for other compression algorithms (e.g., the run-length encoded Burrows-Wheeler transform), no such result is known for LZ-End. We prove that for any string of length <math><mi>n</mi></math>, the number <math><msub><mrow><mi>z</mi></mrow><mrow><mi>e</mi></mrow></msub></math> of phrases in the LZ-End parsing satisfies <math><msub><mrow><mi>z</mi></mrow><mrow><mi>e</mi></mrow></msub><mo>=</mo><mi>𝒪</mi><mfenced><mrow><mi>z</mi><msup><mrow><mi>l</mi><mi>o</mi><mi>g</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>⁡</mo><mi>n</mi></mrow></mfenced></math>, where <math><mi>z</mi></math> is the number of phrases in the LZ77 parsing. This is the first non-trivial upper bound on the size of LZ-End parsing in terms of LZ77, and it puts LZ-End among the strongest dictionary compressors. Using our techniques we also derive bounds for other variants of LZ-End and with respect to other compression measures. Our second contribution is a data structure that implements random access queries to the text in <math><mi>𝒪</mi><mfenced><mrow><msub><mrow><mi>z</mi></mrow><mrow><mi>e</mi></mrow></msub></mrow></mfenced></math> space and <math><mi>𝒪</mi><mo>(</mo><mi>p</mi><mi>o</mi><mi>l</mi><mi>y</mi><mi>l</mi><mi>o</mi><mi>g</mi><mo>⁡</mo><mi>n</mi><mo>)</mo></math> time. This is the first linear-size structure on LZ-End that efficiently implements such queries. All previous data structures either incur a logarithmic penalty in the space or have slow queries. We also show how to extend these techniques to support longest-common-extension (LCE) queries.</p>","PeriodicalId":22015,"journal":{"name":"Soil Science","volume":"144 1","pages":"2847-2866"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11145761/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/1.9781611977073.111","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

Lempel-Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called "phrases"), every time forming as a phrase the longest prefix of the unprocessed part that has an earlier occurrence. In 2010, Kreft and Navarro introduced a variant of LZ77 called LZ-End, that additionally requires the previous occurrence of each phrase to end at the boundary of an already existing phrase. Due to its excellent practical performance as a compression algorithm and a compressed index, they conjectured that it achieves a compression that can be provably upper-bounded in terms of the LZ77 size. Despite the recent progress in understanding such relation for other compression algorithms (e.g., the run-length encoded Burrows-Wheeler transform), no such result is known for LZ-End. We prove that for any string of length $n$ , the number $z_{e}$ of phrases in the LZ-End parsing satisfies $z_{e} = 𝒪 (z {l o g}^{2} n)$ , where $z$ is the number of phrases in the LZ77 parsing. This is the first non-trivial upper bound on the size of LZ-End parsing in terms of LZ77, and it puts LZ-End among the strongest dictionary compressors. Using our techniques we also derive bounds for other variants of LZ-End and with respect to other compression measures. Our second contribution is a data structure that implements random access queries to the text in $𝒪 (z_{e})$ space and $𝒪 (p o l y l o g n)$ time. This is the first linear-size structure on LZ-End that efficiently implements such queries. All previous data structures either incur a logarithmic penalty in the space or have slow queries. We also show how to extend these techniques to support longest-common-extension (LCE) queries.

查看原文本刊更多论文

LZ 端解析的上限和线性空间查询。

Lempel-Ziv (LZ77) 压缩是最常用的无损压缩算法。其基本思想是将输入字符串贪婪地分解成若干块（称为 "词组"），每次将未处理部分中出现较早的最长前缀组成一个词组。2010 年，Kreft 和 Navarro 引入了 LZ77 的一个变体，称为 LZ-End，该变体要求每个短语的前一次出现必须在一个已存在短语的边界处结束。由于其作为压缩算法和压缩索引的出色实用性能，他们猜想它所实现的压缩率可以证明为 LZ77 大小的上限。尽管最近在理解其他压缩算法（如运行长度编码的 Burrows-Wheeler 变换）的这种关系方面取得了进展，但 LZ-End 还没有这样的结果。我们证明，对于任何长度为 n 的字符串，LZ-End 解析中的短语数 ze 满足 ze=𝒪zlog2n ，其中 z 是 LZ77 解析中的短语数。这是第一个以 LZ77 为单位的 LZ-End 解析规模的非难上限，它使 LZ-End 成为最强的词典压缩器之一。利用我们的技术，我们还推导出了 LZ-End 的其他变体以及其他压缩措施的界限。我们的第二个贡献是一种数据结构，它能在ᵊze 空间和ᵊ(polylogn) 时间内实现文本的随机存取查询。这是 LZ-End 上第一个有效实现此类查询的线性大小结构。之前所有的数据结构要么在空间上产生对数惩罚，要么查询速度很慢。我们还展示了如何扩展这些技术以支持最长公共扩展（LCE）查询。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Soil Science 农林科学-土壤科学

CiteScore

2.70

自引率

0.00%

发文量

审稿时长

4.4 months

期刊介绍： Cessation.Soil Science satisfies the professional needs of all scientists and laboratory personnel involved in soil and plant research by publishing primary research reports and critical reviews of basic and applied soil science, especially as it relates to soil and plant studies and general environmental soil science. Each month, Soil Science presents authoritative research articles from an impressive array of discipline: soil chemistry and biochemistry, physics, fertility and nutrition, soil genesis and morphology, soil microbiology and mineralogy. Of immediate relevance to soil scientists-both industrial and academic-this unique publication also has long-range value for agronomists and environmental scientists.