A scalable, high-performance customized priority queue

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI:10.1109/FPL.2014.6927413

Muhuan Huang, Kevin T. Lim, J. Cong

{"title":"A scalable, high-performance customized priority queue","authors":"Muhuan Huang, Kevin T. Lim, J. Cong","doi":"10.1109/FPL.2014.6927413","DOIUrl":null,"url":null,"abstract":"Priority queues are abstract data structures where each element is associated with a priority, and the highest priority element is always retrieved first from the queue. The data structure is widely used within databases, including the last stage of a merge-sort, forecasting read-ahead I/O to stream data for the merge-sort, and replacement selection sort. Typical software implementations use a balanced binary tree-based structure, providing O(log N) time for both enqueue and dequeue operations. To improve the performance, we propose several scalable and high-speed FPGA-based implementations of a priority queue. Our insight is that the above listed applications primarily use priority queues through “replace” operations, which remove the highest priority element and place a new element into the queue. Thus, our designs are customized for this operation, allowing for a simple and scalable architecture. We implement three priority queue designs, including use of a register-based array, register-based tree, and BRAM-based tree, which have different benefits and trade-offs of throughput, frequency, and maximum size. More importantly, all designs achieve O(1) time between replace operations. To incorporate the best aspects of our designs, we propose a Hybrid Priority Queue (H-PQ), which combines a register-based array with multiple BRAM-based trees. This design provides, on average, very fast access times to the top items in the queue (through the register-based array), while scaling to large priority queue sizes (through the BRAM-based trees). In our evaluations, we find that H-PQ achieves 4.3x speedup and 21.5x energy efficiency, compared with the Xeon CPU implementations.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL.2014.6927413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

Abstract

Priority queues are abstract data structures where each element is associated with a priority, and the highest priority element is always retrieved first from the queue. The data structure is widely used within databases, including the last stage of a merge-sort, forecasting read-ahead I/O to stream data for the merge-sort, and replacement selection sort. Typical software implementations use a balanced binary tree-based structure, providing O(log N) time for both enqueue and dequeue operations. To improve the performance, we propose several scalable and high-speed FPGA-based implementations of a priority queue. Our insight is that the above listed applications primarily use priority queues through “replace” operations, which remove the highest priority element and place a new element into the queue. Thus, our designs are customized for this operation, allowing for a simple and scalable architecture. We implement three priority queue designs, including use of a register-based array, register-based tree, and BRAM-based tree, which have different benefits and trade-offs of throughput, frequency, and maximum size. More importantly, all designs achieve O(1) time between replace operations. To incorporate the best aspects of our designs, we propose a Hybrid Priority Queue (H-PQ), which combines a register-based array with multiple BRAM-based trees. This design provides, on average, very fast access times to the top items in the queue (through the register-based array), while scaling to large priority queue sizes (through the BRAM-based trees). In our evaluations, we find that H-PQ achieves 4.3x speedup and 21.5x energy efficiency, compared with the Xeon CPU implementations.

查看原文本刊更多论文

可伸缩的高性能定制优先级队列

优先级队列是抽象的数据结构，其中每个元素都与一个优先级相关联，并且总是首先从队列中检索优先级最高的元素。该数据结构在数据库中广泛使用，包括合并排序的最后阶段、预测预读I/O以流数据进行合并排序以及替换选择排序。典型的软件实现使用平衡的基于二叉树的结构，为排队和脱队操作提供O(log N)时间。为了提高性能，我们提出了几个可扩展的、高速的基于fpga的优先队列实现。我们的见解是，上面列出的应用程序主要通过“替换”操作使用优先级队列，该操作删除最高优先级元素并将新元素放入队列中。因此，我们的设计是针对此操作定制的，允许简单且可扩展的架构。我们实现了三种优先级队列设计，包括使用基于寄存器的数组、基于寄存器的树和基于bram的树，它们在吞吐量、频率和最大大小方面具有不同的优点和权衡。更重要的是，所有设计的替换操作之间的时间间隔都达到了0(1)。为了结合我们设计的最佳方面，我们提出了一个混合优先级队列(H-PQ)，它将基于寄存器的数组与多个基于bram的树相结合。这种设计提供了平均非常快的访问时间(通过基于寄存器的数组)，同时扩展到更大的优先级队列大小(通过基于bram的树)。在我们的评估中，我们发现与Xeon CPU实现相比，H-PQ实现了4.3倍的加速和21.5倍的能效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

自引率

0.00%

发文量