CPU-FPGA异构平台上的等价联接加速

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2016-05-01 DOI:10.1109/FCCM.2016.62

Ren Chen, V. Prasanna

{"title":"CPU-FPGA异构平台上的等价联接加速","authors":"Ren Chen, V. Prasanna","doi":"10.1109/FCCM.2016.62","DOIUrl":null,"url":null,"abstract":"Accelerating database applications using FPGAs has recently been an area of growing interest in both academia and industry. Equi-join is one of the key database operations whose performance highly depends on sorting, which exhibits high memory usage on FPGA. A fully pipelined N-key merge sorter consists of log N sorting stages using O(N) memory totally. For large data sets, external memory has to be employed to perform data buffering between the sorting stages. This introduces pipeline stalls as well as several iterations between FPGA and external memory, causing significant performance degradation. In this paper, we speed-up equi-join using a hybrid CPU-FPGA heterogeneous platform. To alleviate the performance impact of limited memory, we propose a merge sort based hybrid design where the first few sorting stages in the merge sort tree are replaced with \"folded\" bitonic sorting networks. These \"folded\" bitonic sorting networks operate in parallel on the FPGA. The partial results are then merged on the CPU to produce the final sorted result. Based on this hybrid sorting design, we develop two streaming join algorithms by optimizing the classic CPU-based nested-loop join and sort-merge join algorithms. On a rangeof data set sizes, our design achieves throughput improvement of 3.1x and 1.9x compared with software-only and FPGA only implementations, respectively. Our design sustains 21.6% of thepeak bandwidth, which is 3.9x utilization obtained by the state-of-the-art FPGA equi-join implementation.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"272 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform\",\"authors\":\"Ren Chen, V. Prasanna\",\"doi\":\"10.1109/FCCM.2016.62\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accelerating database applications using FPGAs has recently been an area of growing interest in both academia and industry. Equi-join is one of the key database operations whose performance highly depends on sorting, which exhibits high memory usage on FPGA. A fully pipelined N-key merge sorter consists of log N sorting stages using O(N) memory totally. For large data sets, external memory has to be employed to perform data buffering between the sorting stages. This introduces pipeline stalls as well as several iterations between FPGA and external memory, causing significant performance degradation. In this paper, we speed-up equi-join using a hybrid CPU-FPGA heterogeneous platform. To alleviate the performance impact of limited memory, we propose a merge sort based hybrid design where the first few sorting stages in the merge sort tree are replaced with \\\"folded\\\" bitonic sorting networks. These \\\"folded\\\" bitonic sorting networks operate in parallel on the FPGA. The partial results are then merged on the CPU to produce the final sorted result. Based on this hybrid sorting design, we develop two streaming join algorithms by optimizing the classic CPU-based nested-loop join and sort-merge join algorithms. On a rangeof data set sizes, our design achieves throughput improvement of 3.1x and 1.9x compared with software-only and FPGA only implementations, respectively. Our design sustains 21.6% of thepeak bandwidth, which is 3.9x utilization obtained by the state-of-the-art FPGA equi-join implementation.\",\"PeriodicalId\":113498,\"journal\":{\"name\":\"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"272 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2016.62\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

最近，使用fpga加速数据库应用已成为学术界和工业界越来越感兴趣的领域。等同连接是数据库的关键操作之一，其性能高度依赖于排序，在FPGA上具有很高的内存占用率。一个完全流水线的N键归并排序器由log N个排序阶段组成，总共使用O(N)内存。对于大型数据集，必须使用外部内存在排序阶段之间执行数据缓冲。这将导致管道中断以及FPGA和外部存储器之间的多次迭代，从而导致显著的性能下降。在本文中，我们使用一个混合的CPU-FPGA异构平台来加速等速连接。为了减轻有限内存对性能的影响，我们提出了一种基于合并排序的混合设计，其中合并排序树中的前几个排序阶段被“折叠”双元排序网络取代。这些“折叠”的双音排序网络在FPGA上并行运行。然后将部分结果合并到CPU上以产生最终排序结果。基于这种混合排序设计，我们通过优化经典的基于cpu的嵌套循环连接和排序合并连接算法，开发了两种流连接算法。在数据集大小的范围内，我们的设计与纯软件和纯FPGA实现相比，吞吐量分别提高了3.1倍和1.9倍。我们的设计维持了21.6%的峰值带宽，这是最先进的FPGA等同连接实现的3.9倍利用率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform

Accelerating database applications using FPGAs has recently been an area of growing interest in both academia and industry. Equi-join is one of the key database operations whose performance highly depends on sorting, which exhibits high memory usage on FPGA. A fully pipelined N-key merge sorter consists of log N sorting stages using O(N) memory totally. For large data sets, external memory has to be employed to perform data buffering between the sorting stages. This introduces pipeline stalls as well as several iterations between FPGA and external memory, causing significant performance degradation. In this paper, we speed-up equi-join using a hybrid CPU-FPGA heterogeneous platform. To alleviate the performance impact of limited memory, we propose a merge sort based hybrid design where the first few sorting stages in the merge sort tree are replaced with "folded" bitonic sorting networks. These "folded" bitonic sorting networks operate in parallel on the FPGA. The partial results are then merged on the CPU to produce the final sorted result. Based on this hybrid sorting design, we develop two streaming join algorithms by optimizing the classic CPU-based nested-loop join and sort-merge join algorithms. On a rangeof data set sizes, our design achieves throughput improvement of 3.1x and 1.9x compared with software-only and FPGA only implementations, respectively. Our design sustains 21.6% of thepeak bandwidth, which is 3.9x utilization obtained by the state-of-the-art FPGA equi-join implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量