On optimality of variants of the block sorting compression

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI:10.1109/DCC.1998.672312

K. Sadakane

{"title":"On optimality of variants of the block sorting compression","authors":"K. Sadakane","doi":"10.1109/DCC.1998.672312","DOIUrl":null,"url":null,"abstract":"Summary form only given. Block sorting uses the Burrows-Wheeler transformation (BWT) which permutes an input string. The permutation is defined by the lexicographic order of contexts of symbols. If we assume that symbol probability is defined by preceding k symbols called context, symbols whose contexts are the same are collected in consecutive regions after the BWT. Sadakane (1997) proposed a variant of the block sorting and it is asymptotically optimal for any finite-order Markov source if permutation of symbols whose contexts are the same is random. However, the variant encodes 1 symbols as a block and therefore it is not practical because 1 is large. We propose two compression schemes not using blocks but encoding symbols one by one by using arithmetic codes. The move-to-front transformation is not used. The former encodes symbols by different codes defined by symbol frequencies in contexts. It is asymptotically optimal for k-th order Markov sources. However, it is available only if the order k of the source is already known. The latter divides the permuted string into many parts and encodes symbols using different arithmetic codes by the parts. Each part, has symbols whose contexts are the same. If the permutation is random, the scheme is asymptotically optimal for any finite-order Markov source. The permutation in the BWT is not completely random. However, we conjecture that the permuted string is memoryless and our schemes work.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Summary form only given. Block sorting uses the Burrows-Wheeler transformation (BWT) which permutes an input string. The permutation is defined by the lexicographic order of contexts of symbols. If we assume that symbol probability is defined by preceding k symbols called context, symbols whose contexts are the same are collected in consecutive regions after the BWT. Sadakane (1997) proposed a variant of the block sorting and it is asymptotically optimal for any finite-order Markov source if permutation of symbols whose contexts are the same is random. However, the variant encodes 1 symbols as a block and therefore it is not practical because 1 is large. We propose two compression schemes not using blocks but encoding symbols one by one by using arithmetic codes. The move-to-front transformation is not used. The former encodes symbols by different codes defined by symbol frequencies in contexts. It is asymptotically optimal for k-th order Markov sources. However, it is available only if the order k of the source is already known. The latter divides the permuted string into many parts and encodes symbols using different arithmetic codes by the parts. Each part, has symbols whose contexts are the same. If the permutation is random, the scheme is asymptotically optimal for any finite-order Markov source. The permutation in the BWT is not completely random. However, we conjecture that the permuted string is memoryless and our schemes work.

查看原文本刊更多论文

块排序压缩变体的最优性

只提供摘要形式。块排序使用Burrows-Wheeler变换(BWT)来排列输入字符串。排列是由符号上下文的字典顺序来定义的。如果我们假设符号概率由前面的k个称为上下文的符号定义，则在BWT之后的连续区域中收集上下文相同的符号。Sadakane(1997)提出了块排序的一种变体，对于任何有限阶马尔可夫源，如果上下文相同的符号排列是随机的，则它是渐近最优的。但是，该变体将1个符号编码为一个块，因此不实用，因为1很大。我们提出了两种不使用块的压缩方案，而是使用算术编码逐个编码符号。没有使用移动到前面的转换。前者根据上下文中的符号频率定义不同的编码来编码符号。它对于k阶马尔可夫源是渐近最优的。然而，它只有在源的k阶已知的情况下才可用。后者将排列后的字符串分成许多部分，并按部分使用不同的算术编码对符号进行编码。每个部分都有上下文相同的符号。如果排列是随机的，则对于任何有限阶马尔可夫源，该方案都是渐近最优的。BWT中的排列不是完全随机的。然而，我们推测排列后的字符串是无内存的，我们的方案是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)

自引率

0.00%

发文量