贪婪离线文本替换的理论与实践

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225) Pub Date : 1998-03-30 DOI:10.1109/DCC.1998.672138

A. Apostolico, S. Lonardi

{"title":"贪婪离线文本替换的理论与实践","authors":"A. Apostolico, S. Lonardi","doi":"10.1109/DCC.1998.672138","DOIUrl":null,"url":null,"abstract":"Greedy off-line textual substitution refers to the following steepest descent approach to compression or structural inference. Given a long text string x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted text string, until substrings capable of producing contractions can no longer be found. This paper examines the computational issues and performance resulting from implementations of this paradigm in preliminary applications and experiments. Apart from intrinsic interest, these methods may find use in the compression of massively disseminated data, and lend themselves to efficient parallel implementation, perhaps on dedicated architectures.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":"{\"title\":\"Some theory and practice of greedy off-line textual substitution\",\"authors\":\"A. Apostolico, S. Lonardi\",\"doi\":\"10.1109/DCC.1998.672138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Greedy off-line textual substitution refers to the following steepest descent approach to compression or structural inference. Given a long text string x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted text string, until substrings capable of producing contractions can no longer be found. This paper examines the computational issues and performance resulting from implementations of this paradigm in preliminary applications and experiments. Apart from intrinsic interest, these methods may find use in the compression of massively disseminated data, and lend themselves to efficient parallel implementation, perhaps on dedicated architectures.\",\"PeriodicalId\":191890,\"journal\":{\"name\":\"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"40\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCC.1998.672138\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

摘要

贪婪离线文本替换指的是以下最快速下降的压缩或结构推理方法。给定一个长文本字符串x，一个子字符串w被识别出来，这样就可以用一对合适的指针替换x中除一个之外的所有w的实例，从而产生x的最大可能收缩;然后在压缩文本字符串上重复这个过程，直到再也找不到能够产生压缩的子字符串。本文研究了在初步应用和实验中实现这种范式所产生的计算问题和性能。除了固有的兴趣之外，这些方法还可以用于压缩大规模传播的数据，并且可以在专用架构上实现高效的并行实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Some theory and practice of greedy off-line textual substitution

Greedy off-line textual substitution refers to the following steepest descent approach to compression or structural inference. Given a long text string x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted text string, until substrings capable of producing contractions can no longer be found. This paper examines the computational issues and performance resulting from implementations of this paradigm in preliminary applications and experiments. Apart from intrinsic interest, these methods may find use in the compression of massively disseminated data, and lend themselves to efficient parallel implementation, perhaps on dedicated architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)

自引率

0.00%

发文量