Efficient Top-K Query Processing on Massively Parallel Hardware

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI:10.1145/3183713.3183735

Anil Shanbhag, H. Pirk, S. Madden

引用次数: 44

Abstract

A common operation in many data analytics workloads is to find the top-k items, i.e., the largest or smallest operations according to some sort order (implemented via LIMIT or ORDER BY expressions in SQL). A naive implementation of top-k is to sort all of the items and then return the first k, but this does much more work than needed. Although efficient implementations for top-k have been explored on traditional multi-core processors, there has been no prior systematic study of top-k implementations on GPUs, despite open requests for such implementations in GPU-based frameworks like TensorFlow and ArrayFire. In this work, we present several top-k algorithms for GPUs, including a new algorithm based on bitonic sort called bitonic top-k. The bitonic top-k algorithm is up to a factor of \new15x faster than sort and 4x faster than a variety of other possible implementations for values of k up to 256. We also develop a cost model to predict the performance of several of our algorithms, and show that it accurately predicts actual performance on modern GPUs.

查看原文本刊更多论文

大规模并行硬件上高效的Top-K查询处理

许多数据分析工作负载中的一个常见操作是查找top-k项，即根据某种排序顺序(通过SQL中的LIMIT或order BY表达式实现)查找最大或最小的操作。top-k的一种简单实现是对所有项进行排序，然后返回第一个k项，但这样做的工作量远远超过需要的工作量。尽管在传统的多核处理器上已经探索了top-k的有效实现，但是在gpu上还没有对top-k实现的系统研究，尽管在基于gpu的框架(如TensorFlow和ArrayFire)中有公开的实现请求。在这项工作中，我们提出了几种gpu的top-k算法，包括一种基于bitonic排序的新算法，称为bitonic top-k。bitonic top-k算法比sort快15倍，比k值为256的各种其他可能实现快4倍。我们还开发了一个成本模型来预测我们的几个算法的性能，并表明它准确地预测了现代gpu的实际性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 International Conference on Management of Data

自引率

0.00%

发文量