Hypersort: High-performance Parallel Sorting on HBM-enabled FPGA

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI:10.1109/ICFPT56656.2022.9974209

Soundarya Jayaraman, Bingyi Zhang, V. Prasanna

{"title":"Hypersort: High-performance Parallel Sorting on HBM-enabled FPGA","authors":"Soundarya Jayaraman, Bingyi Zhang, V. Prasanna","doi":"10.1109/ICFPT56656.2022.9974209","DOIUrl":null,"url":null,"abstract":"Accelerating sorting on FPGA has been extensively studied by leveraging the fine-grained data parallelism of FPGAs. However, with the optimized hardware pipelines, the performance of sorting algorithms is bounded by the off-chip memory band-width. The integration of high-bandwidth memory (HBM) on FPGAs offers significantly more off-chip memory bandwidth compared with traditional DDR memory, which enables new opportunities for accelerating sorting. In this paper, we develop Hypersort, a hardware accelerator to accelerate sorting on HBM-enabled FPGA. We use columnsort to merge HBM channels. To support the data communication pat-terns of Columnsort, we propose several optimizations to reduce external memory (HBM) traffic and hide data communication latency to further improve the overall throughput. We implement our accelerator on a state-of-the-art HBM-enabled FPGA. Ex-perimental results show that our implementation achieves overall sorting throughput of 34 GB/s, which is up to 14.8×, 4.73× and 2.18 ×faster than the state-of-the-art implementations on CPU, FPGA with external DDR and HBM-enabled FPGA, respectively. The proposed approach demonstrates higher efficiency for merging sorted arrays in HBM channels compared with the state-of-the-art implementation on HBM-enabled FPGA.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Accelerating sorting on FPGA has been extensively studied by leveraging the fine-grained data parallelism of FPGAs. However, with the optimized hardware pipelines, the performance of sorting algorithms is bounded by the off-chip memory band-width. The integration of high-bandwidth memory (HBM) on FPGAs offers significantly more off-chip memory bandwidth compared with traditional DDR memory, which enables new opportunities for accelerating sorting. In this paper, we develop Hypersort, a hardware accelerator to accelerate sorting on HBM-enabled FPGA. We use columnsort to merge HBM channels. To support the data communication pat-terns of Columnsort, we propose several optimizations to reduce external memory (HBM) traffic and hide data communication latency to further improve the overall throughput. We implement our accelerator on a state-of-the-art HBM-enabled FPGA. Ex-perimental results show that our implementation achieves overall sorting throughput of 34 GB/s, which is up to 14.8×, 4.73× and 2.18 ×faster than the state-of-the-art implementations on CPU, FPGA with external DDR and HBM-enabled FPGA, respectively. The proposed approach demonstrates higher efficiency for merging sorted arrays in HBM channels compared with the state-of-the-art implementation on HBM-enabled FPGA.

查看原文本刊更多论文

Hypersort:在支持hbm的FPGA上进行高性能并行排序

利用FPGA的细粒度数据并行性，在FPGA上加速排序已经得到了广泛的研究。然而，经过优化的硬件管道，排序算法的性能受到片外存储器带宽的限制。与传统的DDR存储器相比，fpga上的高带宽存储器(HBM)集成提供了更多的片外存储器带宽，这为加速排序提供了新的机会。在本文中，我们开发了一种硬件加速器Hypersort，用于在支持hbm的FPGA上加速排序。我们使用columnsort来合并HBM通道。为了支持Columnsort的数据通信模式，我们提出了一些优化来减少外部内存(HBM)流量并隐藏数据通信延迟，以进一步提高总体吞吐量。我们在最先进的支持hbm的FPGA上实现我们的加速器。实验结果表明，我们的实现实现了34 GB/s的总排序吞吐量，比目前最先进的CPU实现、带外部DDR的FPGA实现和支持hbm的FPGA实现分别提高了14.8倍、4.73倍和2.18 ×faster。与支持HBM的FPGA上的最新实现相比，所提出的方法在HBM信道中合并排序阵列的效率更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量