Lempel-Ziv算法的一些熵界

Proceedings DCC '97. Data Compression Conference Pub Date : 1997-03-25 DOI:10.1109/DCC.1997.582106

S. Rao Kosaraju, G. Manzini

{"title":"Lempel-Ziv算法的一些熵界","authors":"S. Rao Kosaraju, G. Manzini","doi":"10.1109/DCC.1997.582106","DOIUrl":null,"url":null,"abstract":"Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Some entropic bounds for Lempel-Ziv algorithms\",\"authors\":\"S. Rao Kosaraju, G. Manzini\",\"doi\":\"10.1109/DCC.1997.582106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.\",\"PeriodicalId\":403990,\"journal\":{\"name\":\"Proceedings DCC '97. Data Compression Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC '97. Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1997.582106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '97. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1997.582106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

仅给出摘要形式，如下。通过考虑输入字符串的经验熵，我们开始研究基于解析的压缩算法，如LZ77和LZ78。对于任何字符串s，我们定义k阶熵H/sub k/(s)，通过查看s内每个k长度的子字符串后面的每个符号的出现次数。值H/sub k/(s)是统计建模算法的压缩比的下界，该算法通过查看最近看到的k个字符来预测下一个符号的概率。因此，我们的分析提供了一种将Lempel-Ziv方法与更强大但更慢的PPM算法进行比较的方法。我们的主要贡献是比较了零阶熵H/sub 0/下Lempel-Ziv算法的压缩比。首先，我们证明了低熵字符串的LZ78压缩比可以远远高于H/sub 0/。然后，我们提出了一种改进算法，该算法将LZ78与运行长度编码相结合，能够有效地压缩低熵字符串。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Some entropic bounds for Lempel-Ziv algorithms

Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings DCC '97. Data Compression Conference

自引率

0.00%

发文量