Grouping algorithm for lossless data compression

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI:10.1109/DCC.1998.672316

N. Tadayon, G. Feng, T. Rao, E. Hinds

{"title":"Grouping algorithm for lossless data compression","authors":"N. Tadayon, G. Feng, T. Rao, E. Hinds","doi":"10.1109/DCC.1998.672316","DOIUrl":null,"url":null,"abstract":"Summary form only given. There are in fact two main parts in this paper. One is a modification to context-tree weighting algorithm known as CTW, and the other is a new algorithm called grouping. In the CTW method, we consider a binary tree as a context-tree T/sub D/, where each node s in this tree has length l(s) with 0/spl ges/l(s)/spl ges/D for a source generating a sequence of binary digits. There are counts a/sub s/, and b/sub s/, for each node s of T/sub D/ denoting the number of zeros and ones respectively. Each internal node s of the tree has two children 0s and 1s. The root of the tree corresponds to memoryless model /spl lambda/ and each node corresponds to a prefix sequence. The second part of the paper, introduces a new algorithm that considers all different binary trees of length /spl ges/D as complete sets of alphabets for a binary source. By using the KT estimator for each of these alphabet models, we find the probability distribution gained by all different complete extended alphabets. By grouping these models, we define the coding distribution as the average of the probabilities for all the models. We have demonstrated a quick algorithm for this idea and call this approach a grouping algorithm. This approach also breaks the extended alphabet model probability distribution into the non-extended one. Note that the result of this algorithm will produce at most log M(D) more code words than the optimal selection of strings of length at most D, as letters of the alphabet.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Summary form only given. There are in fact two main parts in this paper. One is a modification to context-tree weighting algorithm known as CTW, and the other is a new algorithm called grouping. In the CTW method, we consider a binary tree as a context-tree T/sub D/, where each node s in this tree has length l(s) with 0/spl ges/l(s)/spl ges/D for a source generating a sequence of binary digits. There are counts a/sub s/, and b/sub s/, for each node s of T/sub D/ denoting the number of zeros and ones respectively. Each internal node s of the tree has two children 0s and 1s. The root of the tree corresponds to memoryless model /spl lambda/ and each node corresponds to a prefix sequence. The second part of the paper, introduces a new algorithm that considers all different binary trees of length /spl ges/D as complete sets of alphabets for a binary source. By using the KT estimator for each of these alphabet models, we find the probability distribution gained by all different complete extended alphabets. By grouping these models, we define the coding distribution as the average of the probabilities for all the models. We have demonstrated a quick algorithm for this idea and call this approach a grouping algorithm. This approach also breaks the extended alphabet model probability distribution into the non-extended one. Note that the result of this algorithm will produce at most log M(D) more code words than the optimal selection of strings of length at most D, as letters of the alphabet.

查看原文本刊更多论文

分组算法的无损数据压缩

只提供摘要形式。实际上，本文主要分为两个部分。一种是对上下文树加权算法CTW的改进，另一种是分组算法。在CTW方法中，我们将二叉树视为上下文树T/sub /D /，其中该树中的每个节点s的长度为l(s)，对于生成二进制数字序列的源，其长度为0/spl ges/l(s)/spl ges/D。对于T/sub D/的每个节点s，分别有计数a/sub s/和计数b/sub s/，分别表示0和1的个数。树的每个内部节点s都有两个子节点0和1。树的根对应于无内存模型/spl lambda/，每个节点对应于一个前缀序列。本文的第二部分介绍了一种新的算法，该算法将长度为/spl ges/D的所有不同的二叉树视为一个二进制源的完整字母表集。通过对每一个字母表模型使用KT估计量，我们找到了所有不同的完全扩展字母表得到的概率分布。通过对这些模型进行分组，我们将编码分布定义为所有模型概率的平均值。我们已经为这个想法演示了一个快速算法，并将这种方法称为分组算法。该方法还将扩展字母表模型的概率分布分解为非扩展字母表模型的概率分布。请注意，此算法的结果将产生最多log M(D)个码字，而不是长度最多为D的字符串的最佳选择，作为字母表中的字母。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)

自引率

0.00%

发文量