DC-Top-k: A Novel Top-k Selecting Algorithm and Its Parallelization

Z. Xue, Ruixuan Li, Heng Zhang, X. Gu, Zhiyong Xu
{"title":"DC-Top-k: A Novel Top-k Selecting Algorithm and Its Parallelization","authors":"Z. Xue, Ruixuan Li, Heng Zhang, X. Gu, Zhiyong Xu","doi":"10.1109/ICPP.2016.49","DOIUrl":null,"url":null,"abstract":"Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n, k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U. S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Sorting is a basic computational task in Computer Science. As a variant of the sorting problem, top-k selecting have been widely used. To our knowledge, on average, the state-of-the-art top-k selecting algorithm Partial Quicksort takes C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k comparisons and about C(n, k)/6 exchanges to select the largest k terms from n terms, where Hn denotes the n-th harmonic number. In this paper, a novel top-k algorithm called DC-Top-k is proposed by employing a divide-and-conquer strategy. By a theoretical analysis, the algorithm is proved to be competitive with the state-of-the-art top-k algorithm on the compare time, with a significant improvement on the exchange time. On average, DC-Top-k takes at most (2-1/k)n+O(klog2k) comparisons and O(klog2k) exchanges to select the largest k terms from n terms. The effectiveness of the proposed algorithm is verified by a number of experiments which show that DC-Top-k is 1-3 times faster than Partial Quicksort and, moreover, is notably stabler than the latter. With an increase of k, it is also significantly more efficient than Min-heap based top-k algorithm (U. S. Patent, 2012). In the end, DC-Top-k is naturally implemented in a parallel computing environment, and a better scalability than Partial Quicksort is also demonstrated by experiments.
DC-Top-k:一种新的Top-k选择算法及其并行化
排序是计算机科学中的一项基本计算任务。top-k选择作为排序问题的一种变体,得到了广泛的应用。据我们所知,平均而言,最先进的top-k选择算法Partial Quicksort需要C(n, k) = 2(n+1)Hn+2n-6k+6-2(n+3-k)Hn+1-k比较和大约C(n, k)/6交换才能从n项中选择最大的k项,其中Hn表示n次谐波数。本文采用分治策略,提出了一种新的顶k算法dc -顶k算法。通过理论分析,证明了该算法在比较时间上与最先进的top-k算法具有竞争力,在交换时间上有显著改善。平均而言,DC-Top-k最多需要(2-1/k)n+O(klog2k)次比较和O(klog2k)次交换才能从n个项中选择最大的k个项。大量实验结果表明,DC-Top-k算法比部分快速排序算法快1-3倍,且稳定性明显优于部分快速排序算法。随着k的增加,其效率也明显高于基于Min-heap的top-k算法(美国专利,2012)。最后,DC-Top-k算法自然地在并行计算环境中实现,并通过实验证明了它比部分快速排序算法具有更好的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信