Constructing word-based text compression algorithms

Data Compression Conference, 1992. Pub Date : 1992-03-24 DOI:10.1109/DCC.1992.227475

R. Horspool, G. Cormack

引用次数: 96

Abstract

Text compression algorithms are normally defined in terms of a source alphabet Sigma of 8-bit ASCII codes. The authors consider choosing Sigma to be an alphabet whose symbols are the words of English or, in general, alternate maximal strings of alphanumeric characters and nonalphanumeric characters. The compression algorithm would be able to take advantage of longer-range correlations between words and thus achieve better compression. The large size of Sigma leads to some implementation problems, but these are overcome to construct word-based LZW, word-based adaptive Huffman, and word-based context modelling compression algorithms.<>

查看原文本刊更多论文

构建基于单词的文本压缩算法

文本压缩算法通常根据8位ASCII码的源字母Sigma来定义。作者考虑选择Sigma作为一个字母，其符号是英语单词，或者通常是字母数字字符和非字母数字字符交替的最大字符串。压缩算法将能够利用单词之间较长距离的相关性，从而实现更好的压缩。Sigma的大尺寸导致了一些实现问题，但这些问题已经被克服，以构建基于单词的LZW，基于单词的自适应Huffman和基于单词的上下文建模压缩算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data Compression Conference, 1992.

自引率

0.00%

发文量