{"title":"Faster algorithms for string matching problems: matching the convolution bound","authors":"P. Indyk","doi":"10.1109/SFCS.1998.743440","DOIUrl":null,"url":null,"abstract":"In this paper we give a randomized O(nlogn)-time algorithm for the string matching with don't cares problem. This improves the Fischer-Paterson bound from 1974 and answers the open problem posed (among others) by Weiner and Galil. Using the same technique, we give an O(nlogn)-time algorithm for other problems, including subset matching, tree pattern matching, (general) approximate threshold matching and point set matching. As this bound essentially matches the complexity of computing of the fast Fourier transform which is the only known technique for solving problems of this type, it is likely that the algorithms are in fact optimal. Additionally the technique used for the threshold matching problem can be applied to the on-line version of this problem, in which we are allowed to preprocess the text and require to process the pattern in time sublinear in the text length. This result involves an interesting variant of the Karp-Rabin fingerprint method in which hash functions are locality-sensitive, i.e. the probability of collision of two words depends on the distance between them.","PeriodicalId":228145,"journal":{"name":"Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"98","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SFCS.1998.743440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 98
Abstract
In this paper we give a randomized O(nlogn)-time algorithm for the string matching with don't cares problem. This improves the Fischer-Paterson bound from 1974 and answers the open problem posed (among others) by Weiner and Galil. Using the same technique, we give an O(nlogn)-time algorithm for other problems, including subset matching, tree pattern matching, (general) approximate threshold matching and point set matching. As this bound essentially matches the complexity of computing of the fast Fourier transform which is the only known technique for solving problems of this type, it is likely that the algorithms are in fact optimal. Additionally the technique used for the threshold matching problem can be applied to the on-line version of this problem, in which we are allowed to preprocess the text and require to process the pattern in time sublinear in the text length. This result involves an interesting variant of the Karp-Rabin fingerprint method in which hash functions are locality-sensitive, i.e. the probability of collision of two words depends on the distance between them.