Martín Farach-Colton, P. Ferragina, S. Muthukrishnan
{"title":"克服后缀树结构中的内存瓶颈","authors":"Martín Farach-Colton, P. Ferragina, S. Muthukrishnan","doi":"10.1109/SFCS.1998.743441","DOIUrl":null,"url":null,"abstract":"The suffix tree of a string is the fundamental data structure of string processing. Recent focus on massive data sets has sparked interest in overcoming the memory bottlenecks of known algorithms for building suffix trees. Our main contribution is a new algorithm for suffix tree construction in which we choreograph almost all disk accesses to be via the sort and scan primitives. This algorithm achieves optimal results in a variety of sequential and parallel computational models. Two of our results are: In the traditional external memory model, in which only the number of disk accesses is counted, we achieve an optimal algorithm, both for single and multiple disk cases. This is the first optimal algorithm known for either model. Traditional disk page access counting does not differentiate between random page accesses and block transfers involving several consecutive pages. This difference is routinely exploited by expert programmers to get fast algorithms on real machines. We adopt a simple accounting scheme and show that our algorithm achieves the same optimal tradeoff for block versus random page accesses as the one we establish for sorting.","PeriodicalId":228145,"journal":{"name":"Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"73","resultStr":"{\"title\":\"Overcoming the memory bottleneck in suffix tree construction\",\"authors\":\"Martín Farach-Colton, P. Ferragina, S. Muthukrishnan\",\"doi\":\"10.1109/SFCS.1998.743441\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The suffix tree of a string is the fundamental data structure of string processing. Recent focus on massive data sets has sparked interest in overcoming the memory bottlenecks of known algorithms for building suffix trees. Our main contribution is a new algorithm for suffix tree construction in which we choreograph almost all disk accesses to be via the sort and scan primitives. This algorithm achieves optimal results in a variety of sequential and parallel computational models. Two of our results are: In the traditional external memory model, in which only the number of disk accesses is counted, we achieve an optimal algorithm, both for single and multiple disk cases. This is the first optimal algorithm known for either model. Traditional disk page access counting does not differentiate between random page accesses and block transfers involving several consecutive pages. This difference is routinely exploited by expert programmers to get fast algorithms on real machines. We adopt a simple accounting scheme and show that our algorithm achieves the same optimal tradeoff for block versus random page accesses as the one we establish for sorting.\",\"PeriodicalId\":228145,\"journal\":{\"name\":\"Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)\",\"volume\":\"151 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"73\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SFCS.1998.743441\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SFCS.1998.743441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Overcoming the memory bottleneck in suffix tree construction
The suffix tree of a string is the fundamental data structure of string processing. Recent focus on massive data sets has sparked interest in overcoming the memory bottlenecks of known algorithms for building suffix trees. Our main contribution is a new algorithm for suffix tree construction in which we choreograph almost all disk accesses to be via the sort and scan primitives. This algorithm achieves optimal results in a variety of sequential and parallel computational models. Two of our results are: In the traditional external memory model, in which only the number of disk accesses is counted, we achieve an optimal algorithm, both for single and multiple disk cases. This is the first optimal algorithm known for either model. Traditional disk page access counting does not differentiate between random page accesses and block transfers involving several consecutive pages. This difference is routinely exploited by expert programmers to get fast algorithms on real machines. We adopt a simple accounting scheme and show that our algorithm achieves the same optimal tradeoff for block versus random page accesses as the one we establish for sorting.