大型文本数据库索引的分布式并行生成

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing Pub Date : 1997-12-10 DOI:10.1109/ICAPP.1997.651539

João Paulo W. Kitajima, M. D. Resende, B. Ribeiro-Neto, N. Ziviani

{"title":"大型文本数据库索引的分布式并行生成","authors":"João Paulo W. Kitajima, M. D. Resende, B. Ribeiro-Neto, N. Ziviani","doi":"10.1109/ICAPP.1997.651539","DOIUrl":null,"url":null,"abstract":"We propose a new algorithm for the parallel generation of suffix arrays for large text databases on high-bandwidth computer networks. Suffix arrays are structures used in full text indexing which support very powerful query languages. Our algorithm is based on a parallel indirect mergesort (it is not a simple mergesort procedure) and is compared with a well known sequential algorithm (which is very efficient running on a single machine). Although network-bounded, the parallel version is theoretically and experimentally a much better alternative when compared to the sequential version (which is I/O-bounded in disk).","PeriodicalId":325978,"journal":{"name":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Distributed parallel generation of indices for very large text databases\",\"authors\":\"João Paulo W. Kitajima, M. D. Resende, B. Ribeiro-Neto, N. Ziviani\",\"doi\":\"10.1109/ICAPP.1997.651539\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new algorithm for the parallel generation of suffix arrays for large text databases on high-bandwidth computer networks. Suffix arrays are structures used in full text indexing which support very powerful query languages. Our algorithm is based on a parallel indirect mergesort (it is not a simple mergesort procedure) and is compared with a well known sequential algorithm (which is very efficient running on a single machine). Although network-bounded, the parallel version is theoretically and experimentally a much better alternative when compared to the sequential version (which is I/O-bounded in disk).\",\"PeriodicalId\":325978,\"journal\":{\"name\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"149 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1997.651539\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1997.651539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

提出了一种基于高带宽计算机网络的大型文本数据库后缀数组并行生成算法。后缀数组是全文索引中使用的结构，它支持非常强大的查询语言。我们的算法基于并行间接归并排序(它不是一个简单的归并排序过程)，并与众所周知的顺序算法(在单台机器上运行非常有效)进行了比较。尽管有网络限制，但与顺序版本(磁盘中有I/ o限制)相比，并行版本在理论上和实验上都是更好的选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distributed parallel generation of indices for very large text databases

We propose a new algorithm for the parallel generation of suffix arrays for large text databases on high-bandwidth computer networks. Suffix arrays are structures used in full text indexing which support very powerful query languages. Our algorithm is based on a parallel indirect mergesort (it is not a simple mergesort procedure) and is compared with a well known sequential algorithm (which is very efficient running on a single machine). Although network-bounded, the parallel version is theoretically and experimentally a much better alternative when compared to the sequential version (which is I/O-bounded in disk).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing

自引率

0.00%

发文量