使用抽样技术设计高效的分布式算法

Proceedings 11th International Parallel Processing Symposium Pub Date : 1997-04-01 DOI:10.1109/IPPS.1997.580932

S. Rajasekaran, David S. L. Wei

{"title":"使用抽样技术设计高效的分布式算法","authors":"S. Rajasekaran, David S. L. Wei","doi":"10.1109/IPPS.1997.580932","DOIUrl":null,"url":null,"abstract":"Shows the power of sampling techniques in designing efficient distributed algorithms. In particular, we show that, by using sampling techniques, selection can be done on some networks in such a way that the message complexity is independent of the cardinality of the set (file), provided the file size is polynomial in the network size. For example, given a file F of size n and an integer k (1/spl les/k/spl les/n), on a p-processor de Bruijn network our deterministic selection algorithm can find the kth smallest key from F using O(p log/sup 3/p) messages and with a communication delay of O(log/sup 3/p), and our randomized selection algorithm can finish the same task using only O(p) messages and a communication delay of O(log p) with high probability, provided the file size is polynomial in network size. Our randomized selection outperforms the existing approaches in terms of both message complexity and communication delay. The property that the number of messages needed and the communication delay are independent of the size of the file makes our distributed selection schemes extremely attractive in such domains as very large database systems. Making use of our selection algorithms to select pivot element(s), we also develop a near-optimal quicksort-based sorting scheme and a nearly-optimal enumeration sorting scheme for sorting large distributed files on the hypercube and de Bruijn networks. Our algorithms are fully distributed without any a priori central control.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Designing efficient distributed algorithms using sampling techniques\",\"authors\":\"S. Rajasekaran, David S. L. Wei\",\"doi\":\"10.1109/IPPS.1997.580932\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Shows the power of sampling techniques in designing efficient distributed algorithms. In particular, we show that, by using sampling techniques, selection can be done on some networks in such a way that the message complexity is independent of the cardinality of the set (file), provided the file size is polynomial in the network size. For example, given a file F of size n and an integer k (1/spl les/k/spl les/n), on a p-processor de Bruijn network our deterministic selection algorithm can find the kth smallest key from F using O(p log/sup 3/p) messages and with a communication delay of O(log/sup 3/p), and our randomized selection algorithm can finish the same task using only O(p) messages and a communication delay of O(log p) with high probability, provided the file size is polynomial in network size. Our randomized selection outperforms the existing approaches in terms of both message complexity and communication delay. The property that the number of messages needed and the communication delay are independent of the size of the file makes our distributed selection schemes extremely attractive in such domains as very large database systems. Making use of our selection algorithms to select pivot element(s), we also develop a near-optimal quicksort-based sorting scheme and a nearly-optimal enumeration sorting scheme for sorting large distributed files on the hypercube and de Bruijn networks. Our algorithms are fully distributed without any a priori central control.\",\"PeriodicalId\":145892,\"journal\":{\"name\":\"Proceedings 11th International Parallel Processing Symposium\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 11th International Parallel Processing Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPPS.1997.580932\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 11th International Parallel Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPPS.1997.580932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

展示了抽样技术在设计高效分布式算法中的作用。特别是，我们表明，通过使用采样技术，可以在某些网络上以这样的方式进行选择，即消息复杂性与集合(文件)的基数无关，前提是文件大小是网络大小的多项式。例如，给定大小为n的文件F和整数k (1/spl les/k/spl les/n)，在p处理器de Bruijn网络上，我们的确定性选择算法可以使用O(p log/sup 3/p)条消息和O(log/sup 3/p)的通信延迟从F中找到第k个最小的密钥，并且我们的随机选择算法可以仅使用O(p)条消息和O(log p)的通信延迟以高概率完成相同的任务，前提是文件大小是网络大小的多项式。我们的随机选择方法在消息复杂性和通信延迟方面都优于现有的方法。所需的消息数量和通信延迟与文件大小无关的特性使我们的分布式选择方案在非常大的数据库系统等领域非常有吸引力。利用我们的选择算法来选择主元素，我们还开发了一个近乎最优的基于快速排序的排序方案和一个近乎最优的枚举排序方案，用于对超立方体和de Bruijn网络上的大型分布式文件进行排序。我们的算法是完全分布式的，没有任何先验的中心控制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Designing efficient distributed algorithms using sampling techniques

Shows the power of sampling techniques in designing efficient distributed algorithms. In particular, we show that, by using sampling techniques, selection can be done on some networks in such a way that the message complexity is independent of the cardinality of the set (file), provided the file size is polynomial in the network size. For example, given a file F of size n and an integer k (1/spl les/k/spl les/n), on a p-processor de Bruijn network our deterministic selection algorithm can find the kth smallest key from F using O(p log/sup 3/p) messages and with a communication delay of O(log/sup 3/p), and our randomized selection algorithm can finish the same task using only O(p) messages and a communication delay of O(log p) with high probability, provided the file size is polynomial in network size. Our randomized selection outperforms the existing approaches in terms of both message complexity and communication delay. The property that the number of messages needed and the communication delay are independent of the size of the file makes our distributed selection schemes extremely attractive in such domains as very large database systems. Making use of our selection algorithms to select pivot element(s), we also develop a near-optimal quicksort-based sorting scheme and a nearly-optimal enumeration sorting scheme for sorting large distributed files on the hypercube and de Bruijn networks. Our algorithms are fully distributed without any a priori central control.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 11th International Parallel Processing Symposium

自引率

0.00%

发文量