{"title":"Design consideration for multi-lingual cascading text compressors","authors":"Chi-Hung Chi, IV YanZhang","doi":"10.1109/DCC.1999.785677","DOIUrl":null,"url":null,"abstract":"Summary form only given. We study the cascading of LZ variants to Huffman coding for multilingual documents. Two models are proposed: the static model and the adaptive (dynamic) model. The static model makes use of the dictionary generated by the LZW algorithm in Chinese dictionary-based Huffman compression to achieve better performance. The dynamic model is an extension of the static cascading model. During the insertion of phrases into the dictionary the frequency count of the phrases is updated so that a dynamic Huffman tree with variable length output tokens is obtained. We propose a new method to capture the \"LZW dictionary\" \"by picking up the dictionary entries during decompression. The general idea is the adding of delimiters during the decompression process so that the decompressed files are segmented into phrases that reflect how the LZW compressor makes use of its dictionary phrases to encode the source. The idea of the adaptive cascading model can be thought as an extension of the Chinese LZW compression. Since the size of the header is one important performance bottleneck in the static cascading model we propose the adaptive cascading model to address this issue. The LZW compressor is now outputting not a fixed length token, but a variable length Huffman code from the Huffman tree. It is expected that such a compressor can achieve very good compression performance. In our adaptive cascading model we choose LZW instead of LZSS because the LZW algorithm preserves more information than the LZSS algorithm does. This characteristic is found to be very useful in helping Chinese compressors to attain better performance.","PeriodicalId":103598,"journal":{"name":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1999.785677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Summary form only given. We study the cascading of LZ variants to Huffman coding for multilingual documents. Two models are proposed: the static model and the adaptive (dynamic) model. The static model makes use of the dictionary generated by the LZW algorithm in Chinese dictionary-based Huffman compression to achieve better performance. The dynamic model is an extension of the static cascading model. During the insertion of phrases into the dictionary the frequency count of the phrases is updated so that a dynamic Huffman tree with variable length output tokens is obtained. We propose a new method to capture the "LZW dictionary" "by picking up the dictionary entries during decompression. The general idea is the adding of delimiters during the decompression process so that the decompressed files are segmented into phrases that reflect how the LZW compressor makes use of its dictionary phrases to encode the source. The idea of the adaptive cascading model can be thought as an extension of the Chinese LZW compression. Since the size of the header is one important performance bottleneck in the static cascading model we propose the adaptive cascading model to address this issue. The LZW compressor is now outputting not a fixed length token, but a variable length Huffman code from the Huffman tree. It is expected that such a compressor can achieve very good compression performance. In our adaptive cascading model we choose LZW instead of LZSS because the LZW algorithm preserves more information than the LZSS algorithm does. This characteristic is found to be very useful in helping Chinese compressors to attain better performance.