实时遍历基于语法的压缩文件

L. Gąsieniec, R. Kolpakov, I. Potapov, P. Sant
{"title":"实时遍历基于语法的压缩文件","authors":"L. Gąsieniec, R. Kolpakov, I. Potapov, P. Sant","doi":"10.1109/DCC.2005.78","DOIUrl":null,"url":null,"abstract":"Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).","PeriodicalId":91161,"journal":{"name":"Proceedings. Data Compression Conference","volume":"9 1","pages":"458-"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"Real-time traversal in grammar-based compressed files\",\"authors\":\"L. Gąsieniec, R. Kolpakov, I. Potapov, P. Sant\",\"doi\":\"10.1109/DCC.2005.78\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).\",\"PeriodicalId\":91161,\"journal\":{\"name\":\"Proceedings. Data Compression Conference\",\"volume\":\"9 1\",\"pages\":\"458-\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Data Compression Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.2005.78\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Data Compression Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2005.78","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51

摘要

只提供摘要形式。在文本压缩应用程序中,能够处理压缩数据而不需要(完全)解压缩是很重要的。在这种情况下,研究压缩方法是至关重要的,这些方法允许时间/空间有效地访问压缩文件的任何片段,而不必强制执行完全解压缩。本文研究了基于语法压缩的压缩文件中连续符号的实时恢复。在这种情况下,压缩文本被表示为一个小的(几Kb)字典D(包含一组码字)和一个非常长的(几Mb)字符串(基于从字典D中绘制的符号)。这种压缩的空间效率与基于Lempel-Ziv方法的标准压缩方法相当。我们证明,一个人可以访问原始文本的连续符号,在恒定的时间和额外的O(|D|)空间内从一个符号移动到另一个符号。该算法是对(L. Gasieniec et al ., Proc. 13 Int)中提出的在线线性(摊平)时间算法的改进。计算机协会。在基金。西奥公司。生物医学工程学报,vol.2138, p.138-152, 2001)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Real-time traversal in grammar-based compressed files
Summary form only given. In text compression applications, it is important to be able to process compressed data without requiring (complete) decompression. In this context it is crucial to study compression methods that allow time/space efficient access to any fragment of a compressed file without being forced to perform complete decompression. We study here the real-time recovery of consecutive symbols from compressed files, in the context of grammar-based compression. In this setting, a compressed text is represented as a small (a few Kb) dictionary D (containing a set of code words), and a very long (a few Mb) string based on symbols drawn from the dictionary D. The space efficiency of this kind of compression is comparable with standard compression methods based on the Lempel-Ziv approach. We show, that one can visit consecutive symbols of the original text, moving from one symbol to another in constant time and extra O(|D|) space. This algorithm is an improvement of the on-line linear (amortised) time algorithm presented in (L. Gasieniec et al, Proc. 13th Int. Symp. on Fund. of Comp. Theo., LNCS, vol.2138, p.138-152, 2001).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信