Grouping algorithm for lossless data compression

N. Tadayon, G. Feng, T. Rao, E. Hinds
{"title":"Grouping algorithm for lossless data compression","authors":"N. Tadayon, G. Feng, T. Rao, E. Hinds","doi":"10.1109/DCC.1998.672316","DOIUrl":null,"url":null,"abstract":"Summary form only given. There are in fact two main parts in this paper. One is a modification to context-tree weighting algorithm known as CTW, and the other is a new algorithm called grouping. In the CTW method, we consider a binary tree as a context-tree T/sub D/, where each node s in this tree has length l(s) with 0/spl ges/l(s)/spl ges/D for a source generating a sequence of binary digits. There are counts a/sub s/, and b/sub s/, for each node s of T/sub D/ denoting the number of zeros and ones respectively. Each internal node s of the tree has two children 0s and 1s. The root of the tree corresponds to memoryless model /spl lambda/ and each node corresponds to a prefix sequence. The second part of the paper, introduces a new algorithm that considers all different binary trees of length /spl ges/D as complete sets of alphabets for a binary source. By using the KT estimator for each of these alphabet models, we find the probability distribution gained by all different complete extended alphabets. By grouping these models, we define the coding distribution as the average of the probabilities for all the models. We have demonstrated a quick algorithm for this idea and call this approach a grouping algorithm. This approach also breaks the extended alphabet model probability distribution into the non-extended one. Note that the result of this algorithm will produce at most log M(D) more code words than the optimal selection of strings of length at most D, as letters of the alphabet.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Summary form only given. There are in fact two main parts in this paper. One is a modification to context-tree weighting algorithm known as CTW, and the other is a new algorithm called grouping. In the CTW method, we consider a binary tree as a context-tree T/sub D/, where each node s in this tree has length l(s) with 0/spl ges/l(s)/spl ges/D for a source generating a sequence of binary digits. There are counts a/sub s/, and b/sub s/, for each node s of T/sub D/ denoting the number of zeros and ones respectively. Each internal node s of the tree has two children 0s and 1s. The root of the tree corresponds to memoryless model /spl lambda/ and each node corresponds to a prefix sequence. The second part of the paper, introduces a new algorithm that considers all different binary trees of length /spl ges/D as complete sets of alphabets for a binary source. By using the KT estimator for each of these alphabet models, we find the probability distribution gained by all different complete extended alphabets. By grouping these models, we define the coding distribution as the average of the probabilities for all the models. We have demonstrated a quick algorithm for this idea and call this approach a grouping algorithm. This approach also breaks the extended alphabet model probability distribution into the non-extended one. Note that the result of this algorithm will produce at most log M(D) more code words than the optimal selection of strings of length at most D, as letters of the alphabet.
分组算法的无损数据压缩
只提供摘要形式。实际上,本文主要分为两个部分。一种是对上下文树加权算法CTW的改进,另一种是分组算法。在CTW方法中,我们将二叉树视为上下文树T/sub /D /,其中该树中的每个节点s的长度为l(s),对于生成二进制数字序列的源,其长度为0/spl ges/l(s)/spl ges/D。对于T/sub D/的每个节点s,分别有计数a/sub s/和计数b/sub s/,分别表示0和1的个数。树的每个内部节点s都有两个子节点0和1。树的根对应于无内存模型/spl lambda/,每个节点对应于一个前缀序列。本文的第二部分介绍了一种新的算法,该算法将长度为/spl ges/D的所有不同的二叉树视为一个二进制源的完整字母表集。通过对每一个字母表模型使用KT估计量,我们找到了所有不同的完全扩展字母表得到的概率分布。通过对这些模型进行分组,我们将编码分布定义为所有模型概率的平均值。我们已经为这个想法演示了一个快速算法,并将这种方法称为分组算法。该方法还将扩展字母表模型的概率分布分解为非扩展字母表模型的概率分布。请注意,此算法的结果将产生最多log M(D)个码字,而不是长度最多为D的字符串的最佳选择,作为字母表中的字母。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信