有界空间中的短语层次推理和压缩

C. Nevill-Manning, I. Witten
{"title":"有界空间中的短语层次推理和压缩","authors":"C. Nevill-Manning, I. Witten","doi":"10.1109/DCC.1998.672146","DOIUrl":null,"url":null,"abstract":"Text compression by inferring a phrase hierarchy from the input is a technique that shows promise as a compression scheme and as a machine learning method that extracts some comprehensible account of the structure of the input text. Its performance as a data compression scheme outstrips other dictionary schemes, and the structures that it learns from sequences have been put to such eclectic uses as phrase browsing in digital libraries, music analysis, and inferring rules for fractal images. We focus attention on the memory requirements of the method. Since the algorithm operates in linear time, the space it consumes is at most linear with input size. The space consumed does in fact grow linearly with the size of the inferred hierarchy, and this makes operation on very large files infeasible. We describe two elegant ways of curtailing the space complexity of hierarchy inference, one of which yields a bounded space algorithm. We begin with a review of the hierarchy inference procedure that is embodied in the SEQUITUR program. Then we consider its performance on quite large files, and show how the compression performance improves as the file size increases.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Phrase hierarchy inference and compression in bounded space\",\"authors\":\"C. Nevill-Manning, I. Witten\",\"doi\":\"10.1109/DCC.1998.672146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text compression by inferring a phrase hierarchy from the input is a technique that shows promise as a compression scheme and as a machine learning method that extracts some comprehensible account of the structure of the input text. Its performance as a data compression scheme outstrips other dictionary schemes, and the structures that it learns from sequences have been put to such eclectic uses as phrase browsing in digital libraries, music analysis, and inferring rules for fractal images. We focus attention on the memory requirements of the method. Since the algorithm operates in linear time, the space it consumes is at most linear with input size. The space consumed does in fact grow linearly with the size of the inferred hierarchy, and this makes operation on very large files infeasible. We describe two elegant ways of curtailing the space complexity of hierarchy inference, one of which yields a bounded space algorithm. We begin with a review of the hierarchy inference procedure that is embodied in the SEQUITUR program. Then we consider its performance on quite large files, and show how the compression performance improves as the file size increases.\",\"PeriodicalId\":191890,\"journal\":{\"name\":\"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1998.672146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

通过从输入推断短语层次结构来进行文本压缩是一种很有前途的压缩方案,也是一种机器学习方法,可以从输入文本的结构中提取一些可理解的说明。作为一种数据压缩方案,它的性能超过了其他字典方案,并且它从序列中学习的结构已经被用于数字图书馆的短语浏览、音乐分析和分形图像的推断规则等各种用途。我们将注意力集中在该方法的内存要求上。由于算法在线性时间内运行,因此它所消耗的空间最多与输入大小成线性关系。实际上,所消耗的空间随着推断的层次结构的大小线性增长,这使得对非常大的文件的操作变得不可行的。我们描述了两种降低层次推理空间复杂度的优雅方法,其中一种产生了有界空间算法。我们首先回顾在SEQUITUR程序中体现的层次推理过程。然后我们考虑它在相当大的文件上的性能,并显示压缩性能如何随着文件大小的增加而提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Phrase hierarchy inference and compression in bounded space
Text compression by inferring a phrase hierarchy from the input is a technique that shows promise as a compression scheme and as a machine learning method that extracts some comprehensible account of the structure of the input text. Its performance as a data compression scheme outstrips other dictionary schemes, and the structures that it learns from sequences have been put to such eclectic uses as phrase browsing in digital libraries, music analysis, and inferring rules for fractal images. We focus attention on the memory requirements of the method. Since the algorithm operates in linear time, the space it consumes is at most linear with input size. The space consumed does in fact grow linearly with the size of the inferred hierarchy, and this makes operation on very large files infeasible. We describe two elegant ways of curtailing the space complexity of hierarchy inference, one of which yields a bounded space algorithm. We begin with a review of the hierarchy inference procedure that is embodied in the SEQUITUR program. Then we consider its performance on quite large files, and show how the compression performance improves as the file size increases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信