{"title":"Lempel-Ziv算法的一些熵界","authors":"S. Rao Kosaraju, G. Manzini","doi":"10.1109/DCC.1997.582106","DOIUrl":null,"url":null,"abstract":"Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.","PeriodicalId":403990,"journal":{"name":"Proceedings DCC '97. Data Compression Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Some entropic bounds for Lempel-Ziv algorithms\",\"authors\":\"S. Rao Kosaraju, G. Manzini\",\"doi\":\"10.1109/DCC.1997.582106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.\",\"PeriodicalId\":403990,\"journal\":{\"name\":\"Proceedings DCC '97. Data Compression Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC '97. Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1997.582106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '97. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1997.582106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Summary form only given, as follows. We initiate a study of parsing-based compression algorithms such as LZ77 and LZ78 by considering the empirical entropy of the input string. For any string s, we define the k-th order entropy H/sub k/(s) by looking at the number of occurrences of each symbol following each k-length substring inside s. The value H/sub k/(s) is a lower bound to the compression ratio of a statistical modeling algorithm which predicts the probability of the next symbol by looking at the k most recently seen characters. Therefore, our analysis provides a means for comparing Lempel-Ziv methods with the more powerful, but slower, PPM algorithms. Our main contribution is a comparison of the compression ratio of Lempel-Ziv algorithms with the zeroth order entropy H/sub 0/. First we show that for low entropy strings LZ78 compression ratio can be much higher than H/sub 0/. Then, we present a modified algorithm which combines LZ78 with run length encoding and is able to compress efficiently also low entropy strings.