Hypersort:在支持hbm的FPGA上进行高性能并行排序

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI:10.1109/ICFPT56656.2022.9974209

Soundarya Jayaraman, Bingyi Zhang, V. Prasanna

{"title":"Hypersort:在支持hbm的FPGA上进行高性能并行排序","authors":"Soundarya Jayaraman, Bingyi Zhang, V. Prasanna","doi":"10.1109/ICFPT56656.2022.9974209","DOIUrl":null,"url":null,"abstract":"Accelerating sorting on FPGA has been extensively studied by leveraging the fine-grained data parallelism of FPGAs. However, with the optimized hardware pipelines, the performance of sorting algorithms is bounded by the off-chip memory band-width. The integration of high-bandwidth memory (HBM) on FPGAs offers significantly more off-chip memory bandwidth compared with traditional DDR memory, which enables new opportunities for accelerating sorting. In this paper, we develop Hypersort, a hardware accelerator to accelerate sorting on HBM-enabled FPGA. We use columnsort to merge HBM channels. To support the data communication pat-terns of Columnsort, we propose several optimizations to reduce external memory (HBM) traffic and hide data communication latency to further improve the overall throughput. We implement our accelerator on a state-of-the-art HBM-enabled FPGA. Ex-perimental results show that our implementation achieves overall sorting throughput of 34 GB/s, which is up to 14.8×, 4.73× and 2.18 ×faster than the state-of-the-art implementations on CPU, FPGA with external DDR and HBM-enabled FPGA, respectively. The proposed approach demonstrates higher efficiency for merging sorted arrays in HBM channels compared with the state-of-the-art implementation on HBM-enabled FPGA.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hypersort: High-performance Parallel Sorting on HBM-enabled FPGA\",\"authors\":\"Soundarya Jayaraman, Bingyi Zhang, V. Prasanna\",\"doi\":\"10.1109/ICFPT56656.2022.9974209\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accelerating sorting on FPGA has been extensively studied by leveraging the fine-grained data parallelism of FPGAs. However, with the optimized hardware pipelines, the performance of sorting algorithms is bounded by the off-chip memory band-width. The integration of high-bandwidth memory (HBM) on FPGAs offers significantly more off-chip memory bandwidth compared with traditional DDR memory, which enables new opportunities for accelerating sorting. In this paper, we develop Hypersort, a hardware accelerator to accelerate sorting on HBM-enabled FPGA. We use columnsort to merge HBM channels. To support the data communication pat-terns of Columnsort, we propose several optimizations to reduce external memory (HBM) traffic and hide data communication latency to further improve the overall throughput. We implement our accelerator on a state-of-the-art HBM-enabled FPGA. Ex-perimental results show that our implementation achieves overall sorting throughput of 34 GB/s, which is up to 14.8×, 4.73× and 2.18 ×faster than the state-of-the-art implementations on CPU, FPGA with external DDR and HBM-enabled FPGA, respectively. The proposed approach demonstrates higher efficiency for merging sorted arrays in HBM channels compared with the state-of-the-art implementation on HBM-enabled FPGA.\",\"PeriodicalId\":239314,\"journal\":{\"name\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT56656.2022.9974209\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

利用FPGA的细粒度数据并行性，在FPGA上加速排序已经得到了广泛的研究。然而，经过优化的硬件管道，排序算法的性能受到片外存储器带宽的限制。与传统的DDR存储器相比，fpga上的高带宽存储器(HBM)集成提供了更多的片外存储器带宽，这为加速排序提供了新的机会。在本文中，我们开发了一种硬件加速器Hypersort，用于在支持hbm的FPGA上加速排序。我们使用columnsort来合并HBM通道。为了支持Columnsort的数据通信模式，我们提出了一些优化来减少外部内存(HBM)流量并隐藏数据通信延迟，以进一步提高总体吞吐量。我们在最先进的支持hbm的FPGA上实现我们的加速器。实验结果表明，我们的实现实现了34 GB/s的总排序吞吐量，比目前最先进的CPU实现、带外部DDR的FPGA实现和支持hbm的FPGA实现分别提高了14.8倍、4.73倍和2.18 ×faster。与支持HBM的FPGA上的最新实现相比，所提出的方法在HBM信道中合并排序阵列的效率更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hypersort: High-performance Parallel Sorting on HBM-enabled FPGA

Accelerating sorting on FPGA has been extensively studied by leveraging the fine-grained data parallelism of FPGAs. However, with the optimized hardware pipelines, the performance of sorting algorithms is bounded by the off-chip memory band-width. The integration of high-bandwidth memory (HBM) on FPGAs offers significantly more off-chip memory bandwidth compared with traditional DDR memory, which enables new opportunities for accelerating sorting. In this paper, we develop Hypersort, a hardware accelerator to accelerate sorting on HBM-enabled FPGA. We use columnsort to merge HBM channels. To support the data communication pat-terns of Columnsort, we propose several optimizations to reduce external memory (HBM) traffic and hide data communication latency to further improve the overall throughput. We implement our accelerator on a state-of-the-art HBM-enabled FPGA. Ex-perimental results show that our implementation achieves overall sorting throughput of 34 GB/s, which is up to 14.8×, 4.73× and 2.18 ×faster than the state-of-the-art implementations on CPU, FPGA with external DDR and HBM-enabled FPGA, respectively. The proposed approach demonstrates higher efficiency for merging sorted arrays in HBM channels compared with the state-of-the-art implementation on HBM-enabled FPGA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量