A tree based binary encoding of text using LZW algorithm

Proceedings DCC '95 Data Compression Conference Pub Date : 1995-03-28 DOI:10.1109/DCC.1995.515573

T. Acharya, A. Mukherjee

{"title":"A tree based binary encoding of text using LZW algorithm","authors":"T. Acharya, A. Mukherjee","doi":"10.1109/DCC.1995.515573","DOIUrl":null,"url":null,"abstract":"Summary form only given. The most popular adaptive dictionary coding scheme used for text compression is the LZW algorithm. In the LZW algorithm, a changing dictionary contains common strings that have been encountered so far in the text. The dictionary can be represented by a dynamic trie. The input text is examined character by character and the longest substring (called a prefix string) of the text which already exists in the trie, is replaced by a pointer to a node in the trie which represents the prefix string. Motivation of our research is to explore a variation of the LZW algorithm for variable-length binary encoding of text (we call it the LZWA algorithm) and to develop a memory-based VLSI architecture for text compression. We proposed a new methodology to represent the trie in the form of a binary tree (we call it a binary trie) to maintain the dictionary used in the LZW scheme. This binary tree maintains all the properties of the trie and can easily be mapped into memory. As a result, the common substrings can be encoded using variable length prefix binary codes. The prefix codes enable us to uniquely decode the text in its original form. The algorithm outperforms the usual LZW scheme when the size of the text is small (usually less than 5 K). Depending upon the characteristics of the text, the improvement of the compression ratio has been achieved around 10-30% compared to the LZW scheme. But its performance degrades for larger size texts.","PeriodicalId":107017,"journal":{"name":"Proceedings DCC '95 Data Compression Conference","volume":"160 11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '95 Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1995.515573","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Summary form only given. The most popular adaptive dictionary coding scheme used for text compression is the LZW algorithm. In the LZW algorithm, a changing dictionary contains common strings that have been encountered so far in the text. The dictionary can be represented by a dynamic trie. The input text is examined character by character and the longest substring (called a prefix string) of the text which already exists in the trie, is replaced by a pointer to a node in the trie which represents the prefix string. Motivation of our research is to explore a variation of the LZW algorithm for variable-length binary encoding of text (we call it the LZWA algorithm) and to develop a memory-based VLSI architecture for text compression. We proposed a new methodology to represent the trie in the form of a binary tree (we call it a binary trie) to maintain the dictionary used in the LZW scheme. This binary tree maintains all the properties of the trie and can easily be mapped into memory. As a result, the common substrings can be encoded using variable length prefix binary codes. The prefix codes enable us to uniquely decode the text in its original form. The algorithm outperforms the usual LZW scheme when the size of the text is small (usually less than 5 K). Depending upon the characteristics of the text, the improvement of the compression ratio has been achieved around 10-30% compared to the LZW scheme. But its performance degrades for larger size texts.

查看原文本刊更多论文

使用LZW算法的基于树的文本二进制编码

只提供摘要形式。用于文本压缩的最流行的自适应字典编码方案是LZW算法。在LZW算法中，不断变化的字典包含到目前为止在文本中遇到的常见字符串。字典可以用一个动态树表示。输入文本将一个字符一个字符地检查，并且已经存在于树中的文本的最长子字符串(称为前缀字符串)将被指向树中代表前缀字符串的节点的指针所替换。我们的研究动机是探索用于文本变长二进制编码的LZW算法的一种变体(我们称之为LZWA算法)，并开发用于文本压缩的基于内存的VLSI架构。我们提出了一种新的方法，以二叉树的形式表示树(我们称之为二叉树)，以维护LZW方案中使用的字典。这个二叉树维护了树的所有属性，并且可以很容易地映射到内存中。因此，可以使用可变长度前缀二进制代码对公共子字符串进行编码。前缀代码使我们能够以其原始形式唯一地解码文本。当文本大小较小(通常小于5 K)时，该算法优于通常的LZW方案。根据文本的特征，与LZW方案相比，压缩比的提高约为10-30%。但是对于较大的文本，其性能会下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings DCC '95 Data Compression Conference

自引率

0.00%

发文量