Compression of unicode files

P. Fenwick, S. Brierley
{"title":"Compression of unicode files","authors":"P. Fenwick, S. Brierley","doi":"10.1109/DCC.1998.672274","DOIUrl":null,"url":null,"abstract":"Summary form only given. The increasing importance of unicode for text files, for example with Java and in some modern operating systems, implies a possible increase of data storage space and data transmission time, with a corresponding need for data compression. However data compressors designed for traditional 8-bit byte data are not necessarily well matched to the peculiarities of unicode data. Different \"standard\" text compression methods behave in different ways, as compared with the performance already known from ASCII or other 8-bit data. A small corpus of unicode files has been compressed on several widely-available text compressors of the various types, confirming that unicode files have different compression characteristics from those known for 8-bit data. Tests with a simple LZ-77 compressor designed to operate in both 8-bit and 16-bit modes indicate that it may be useful to design compressors specifically for unicode data.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Summary form only given. The increasing importance of unicode for text files, for example with Java and in some modern operating systems, implies a possible increase of data storage space and data transmission time, with a corresponding need for data compression. However data compressors designed for traditional 8-bit byte data are not necessarily well matched to the peculiarities of unicode data. Different "standard" text compression methods behave in different ways, as compared with the performance already known from ASCII or other 8-bit data. A small corpus of unicode files has been compressed on several widely-available text compressors of the various types, confirming that unicode files have different compression characteristics from those known for 8-bit data. Tests with a simple LZ-77 compressor designed to operate in both 8-bit and 16-bit modes indicate that it may be useful to design compressors specifically for unicode data.
unicode文件的压缩
只提供摘要形式。unicode对于文本文件(例如Java和某些现代操作系统)的重要性日益增加,这意味着数据存储空间和数据传输时间可能会增加,因此需要进行相应的数据压缩。然而,为传统的8位字节数据设计的数据压缩器不一定能很好地匹配unicode数据的特性。与已知的ASCII或其他8位数据的性能相比,不同的“标准”文本压缩方法表现出不同的方式。在几种广泛使用的各种类型的文本压缩器上压缩了一小部分unicode文件,确认unicode文件具有不同于已知的8位数据的压缩特性。使用设计为在8位和16位模式下运行的简单LZ-77压缩器进行的测试表明,专门为unicode数据设计压缩器可能是有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信