LIPT:一种改进压缩的无损文本转换

Proceedings International Conference on Information Technology: Coding and Computing Pub Date : 2001-04-02 DOI:10.1109/ITCC.2001.918838

F. Awan, A. Mukherjee

{"title":"LIPT:一种改进压缩的无损文本转换","authors":"F. Awan, A. Mukherjee","doi":"10.1109/ITCC.2001.918838","DOIUrl":null,"url":null,"abstract":"We propose an approach to develop a dictionary based reversible lossless text transformation, called LIFT (length index preserving transform), which can be applied to a source text to improve the existing algorithm's ability to compress. In LIFT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of the recurrence of same length words in the English language to create context in the transformed text that the entropy coders can exploit. LIFT also achieves some compression at the preprocessing stage and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIFT gives 5.24% improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIFT, for our test corpus.","PeriodicalId":318295,"journal":{"name":"Proceedings International Conference on Information Technology: Coding and Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":"{\"title\":\"LIPT: a lossless text transform to improve compression\",\"authors\":\"F. Awan, A. Mukherjee\",\"doi\":\"10.1109/ITCC.2001.918838\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an approach to develop a dictionary based reversible lossless text transformation, called LIFT (length index preserving transform), which can be applied to a source text to improve the existing algorithm's ability to compress. In LIFT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of the recurrence of same length words in the English language to create context in the transformed text that the entropy coders can exploit. LIFT also achieves some compression at the preprocessing stage and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIFT gives 5.24% improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIFT, for our test corpus.\",\"PeriodicalId\":318295,\"journal\":{\"name\":\"Proceedings International Conference on Information Technology: Coding and Computing\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"54\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings International Conference on Information Technology: Coding and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITCC.2001.918838\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Information Technology: Coding and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2001.918838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 54

摘要

我们提出了一种基于字典的可逆无损文本转换的方法，称为LIFT(长度索引保持变换)，它可以应用于源文本，以提高现有算法的压缩能力。在LIFT中，输入单词的长度和字典中单词的偏移量用字母表示。我们的编码方案利用英语中相同长度单词的重复出现来在转换后的文本中创建上下文，熵编码器可以利用这些上下文。LIFT还在预处理阶段实现了一些压缩，并为压缩算法保留了足够的上下文和冗余，以获得更好的结果。对于我们的测试语料库，带有LIFT的Bzip2比没有LIPT的Bzip2的平均BPC提高了5.24%，而带有LIPT的PPMD比没有LIFT的PPMD的平均BPC提高了4.46%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LIPT: a lossless text transform to improve compression

We propose an approach to develop a dictionary based reversible lossless text transformation, called LIFT (length index preserving transform), which can be applied to a source text to improve the existing algorithm's ability to compress. In LIFT, the length of the input word and the offset of the words in the dictionary are denoted with alphabets. Our encoding scheme makes use of the recurrence of same length words in the English language to create context in the transformed text that the entropy coders can exploit. LIFT also achieves some compression at the preprocessing stage and retains enough context and redundancy for the compression algorithms to give better results. Bzip2 with LIFT gives 5.24% improvement in average BPC over Bzip2 without LIPT, and PPMD with LIPT gives 4.46% improvement in average BPC over PPMD without LIFT, for our test corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings International Conference on Information Technology: Coding and Computing

自引率

0.00%

发文量