{"title":"Some theory and practice of greedy off-line textual substitution","authors":"A. Apostolico, S. Lonardi","doi":"10.1109/DCC.1998.672138","DOIUrl":null,"url":null,"abstract":"Greedy off-line textual substitution refers to the following steepest descent approach to compression or structural inference. Given a long text string x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted text string, until substrings capable of producing contractions can no longer be found. This paper examines the computational issues and performance resulting from implementations of this paradigm in preliminary applications and experiments. Apart from intrinsic interest, these methods may find use in the compression of massively disseminated data, and lend themselves to efficient parallel implementation, perhaps on dedicated architectures.","PeriodicalId":191890,"journal":{"name":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.1998.672138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 40
Abstract
Greedy off-line textual substitution refers to the following steepest descent approach to compression or structural inference. Given a long text string x, a substring w is identified such that replacing all instances of w in x except one by a suitable pair of pointers yields the highest possible contraction of x; the process is then repeated on the contracted text string, until substrings capable of producing contractions can no longer be found. This paper examines the computational issues and performance resulting from implementations of this paradigm in preliminary applications and experiments. Apart from intrinsic interest, these methods may find use in the compression of massively disseminated data, and lend themselves to efficient parallel implementation, perhaps on dedicated architectures.