Multi-lingual cascading text compressors for WWW

Chi-Hung Chi
{"title":"Multi-lingual cascading text compressors for WWW","authors":"Chi-Hung Chi","doi":"10.1109/ITCC.2000.844279","DOIUrl":null,"url":null,"abstract":"Global sharing and distribution of information on the Internet result in a great demand for efficient multi-lingual text compression for Web servers and proxy implementations. Current text compressors such as Huffman coding, Lempel-Ziv (LZ) variants, and LZ-Huffman cascading fail to perform efficiently because of the mis-matched character sampling size and the large character set of multilingual languages. Our previous research has shown that a better compression ratio can be obtained by re-adjusting the character sampling rate. We investigate the cascading of LZ variants to Huffman coding for multilingual documents. Two basic approaches, static and dynamic dictionaries, are proposed. Techniques for reducing the dictionary overhead are also suggested. Based on our multi-lingual corpus, our adaptive cascading scheme can perform better than the well-known cascading compressor, gzip, by an average of about 20%.","PeriodicalId":146581,"journal":{"name":"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2000.844279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Global sharing and distribution of information on the Internet result in a great demand for efficient multi-lingual text compression for Web servers and proxy implementations. Current text compressors such as Huffman coding, Lempel-Ziv (LZ) variants, and LZ-Huffman cascading fail to perform efficiently because of the mis-matched character sampling size and the large character set of multilingual languages. Our previous research has shown that a better compression ratio can be obtained by re-adjusting the character sampling rate. We investigate the cascading of LZ variants to Huffman coding for multilingual documents. Two basic approaches, static and dynamic dictionaries, are proposed. Techniques for reducing the dictionary overhead are also suggested. Based on our multi-lingual corpus, our adaptive cascading scheme can perform better than the well-known cascading compressor, gzip, by an average of about 20%.
用于WWW的多语言级联文本压缩器
Internet上信息的全局共享和分发导致对Web服务器和代理实现的高效多语言文本压缩的巨大需求。当前的文本压缩器如Huffman编码、Lempel-Ziv (LZ)变体和LZ-Huffman级联等由于字符采样大小不匹配和多语言语言的大字符集而无法有效执行。我们之前的研究表明,通过重新调整字符采样率可以获得更好的压缩比。我们研究了多语言文档的LZ变体到霍夫曼编码的级联。提出了静态字典和动态字典两种基本方法。还建议了减少字典开销的技术。基于我们的多语言语料库,我们的自适应级联方案比著名的级联压缩器gzip的性能平均提高约20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信