{"title":"DC-Top-k:一种新的Top-k选择算法及其并行化","authors":"Z. Xue, Ruixuan Li, Heng Zhang, X. Gu, Zhiyong Xu","doi":"10.1109/ICPP.2016.49","DOIUrl":null,"url":null,"abstract":"Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n, k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U. S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"DC-Top-k: A Novel Top-k Selecting Algorithm and Its Parallelization\",\"authors\":\"Z. Xue, Ruixuan Li, Heng Zhang, X. Gu, Zhiyong Xu\",\"doi\":\"10.1109/ICPP.2016.49\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n, k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U. S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.\",\"PeriodicalId\":409991,\"journal\":{\"name\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2016.49\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DC-Top-k: A Novel Top-k Selecting Algorithm and Its Parallelization
Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n, k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U. S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.