{"title":"DC-Top-k: A Novel Top-k Selecting Algorithm and Its Parallelization","authors":"Z. Xue, Ruixuan Li, Heng Zhang, X. Gu, Zhiyong Xu","doi":"10.1109/ICPP.2016.49","DOIUrl":null,"url":null,"abstract":"Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n, k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U. S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n, k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U. S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.