{"title":"An Upper Bound and Linear-Space Queries on the LZ-End Parsing.","authors":"Dominik Kempa, Barna Saha","doi":"10.1137/1.9781611977073.111","DOIUrl":null,"url":null,"abstract":"<p><p>Lempel-Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called \"phrases\"), every time forming as a phrase the longest prefix of the unprocessed part that has an earlier occurrence. In 2010, Kreft and Navarro introduced a variant of LZ77 called LZ-End, that additionally requires the previous occurrence of each phrase to end at the boundary of an already existing phrase. Due to its excellent practical performance as a compression algorithm and a compressed index, they conjectured that it achieves a compression that can be provably upper-bounded in terms of the LZ77 size. Despite the recent progress in understanding such relation for other compression algorithms (e.g., the run-length encoded Burrows-Wheeler transform), no such result is known for LZ-End. We prove that for any string of length <math><mi>n</mi></math>, the number <math><msub><mrow><mi>z</mi></mrow><mrow><mi>e</mi></mrow></msub></math> of phrases in the LZ-End parsing satisfies <math><msub><mrow><mi>z</mi></mrow><mrow><mi>e</mi></mrow></msub><mo>=</mo><mi>𝒪</mi><mfenced><mrow><mi>z</mi><msup><mrow><mi>l</mi><mi>o</mi><mi>g</mi></mrow><mrow><mn>2</mn></mrow></msup><mo></mo><mi>n</mi></mrow></mfenced></math>, where <math><mi>z</mi></math> is the number of phrases in the LZ77 parsing. This is the first non-trivial upper bound on the size of LZ-End parsing in terms of LZ77, and it puts LZ-End among the strongest dictionary compressors. Using our techniques we also derive bounds for other variants of LZ-End and with respect to other compression measures. Our second contribution is a data structure that implements random access queries to the text in <math><mi>𝒪</mi><mfenced><mrow><msub><mrow><mi>z</mi></mrow><mrow><mi>e</mi></mrow></msub></mrow></mfenced></math> space and <math><mi>𝒪</mi><mo>(</mo><mi>p</mi><mi>o</mi><mi>l</mi><mi>y</mi><mi>l</mi><mi>o</mi><mi>g</mi><mo></mo><mi>n</mi><mo>)</mo></math> time. This is the first linear-size structure on LZ-End that efficiently implements such queries. All previous data structures either incur a logarithmic penalty in the space or have slow queries. We also show how to extend these techniques to support longest-common-extension (LCE) queries.</p>","PeriodicalId":22015,"journal":{"name":"Soil Science","volume":"144 1","pages":"2847-2866"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11145761/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soil Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/1.9781611977073.111","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Agricultural and Biological Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Lempel-Ziv (LZ77) compression is the most commonly used lossless compression algorithm. The basic idea is to greedily break the input string into blocks (called "phrases"), every time forming as a phrase the longest prefix of the unprocessed part that has an earlier occurrence. In 2010, Kreft and Navarro introduced a variant of LZ77 called LZ-End, that additionally requires the previous occurrence of each phrase to end at the boundary of an already existing phrase. Due to its excellent practical performance as a compression algorithm and a compressed index, they conjectured that it achieves a compression that can be provably upper-bounded in terms of the LZ77 size. Despite the recent progress in understanding such relation for other compression algorithms (e.g., the run-length encoded Burrows-Wheeler transform), no such result is known for LZ-End. We prove that for any string of length , the number of phrases in the LZ-End parsing satisfies , where is the number of phrases in the LZ77 parsing. This is the first non-trivial upper bound on the size of LZ-End parsing in terms of LZ77, and it puts LZ-End among the strongest dictionary compressors. Using our techniques we also derive bounds for other variants of LZ-End and with respect to other compression measures. Our second contribution is a data structure that implements random access queries to the text in space and time. This is the first linear-size structure on LZ-End that efficiently implements such queries. All previous data structures either incur a logarithmic penalty in the space or have slow queries. We also show how to extend these techniques to support longest-common-extension (LCE) queries.
期刊介绍:
Cessation.Soil Science satisfies the professional needs of all scientists and laboratory personnel involved in soil and plant research by publishing primary research reports and critical reviews of basic and applied soil science, especially as it relates to soil and plant studies and general environmental soil science.
Each month, Soil Science presents authoritative research articles from an impressive array of discipline: soil chemistry and biochemistry, physics, fertility and nutrition, soil genesis and morphology, soil microbiology and mineralogy. Of immediate relevance to soil scientists-both industrial and academic-this unique publication also has long-range value for agronomists and environmental scientists.