数据中心中的fpga:并行混合超标量字符串样本排序的情况

Mikhail Asiatici, Damian Maiorano, P. Ienne
{"title":"数据中心中的fpga:并行混合超标量字符串样本排序的情况","authors":"Mikhail Asiatici, Damian Maiorano, P. Ienne","doi":"10.1109/ASAP49362.2020.00031","DOIUrl":null,"url":null,"abstract":"String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise that no string sorters on FPGA have been proposed yet. In this paper, we present Parallel Hybrid Super Scalar String Sample Sort (pHS5) on Intel HARPv2, a heterogeneous CPU-FPGA system with a server-grade multi-core CPU. Our pHS5 is based on the state-of-the-art string sorting algorithm for multi-core shared memory CPUs, pS5, which we extended with multiple processing elements (PEs) on the FPGA. Each PE accelerates one instance of the most effectively parallelizable dominant kernel of pS5 by up to 33% compared to a single Intel Xeon Broadwell core running at 3.4 GHz. Furthermore, we extended the job scheduling mechanism of pS5 to enable our PEs to compete with the CPU cores for processing the accelerable kernel, while retaining the complex high-level control flow and the sorting of the smaller data sets on the CPU. We accelerate the whole algorithm by up to 10% compared to the 28 thread software baseline running on the 14-core Xeon processor and by up to 36% at lower thread counts.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort\",\"authors\":\"Mikhail Asiatici, Damian Maiorano, P. Ienne\",\"doi\":\"10.1109/ASAP49362.2020.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise that no string sorters on FPGA have been proposed yet. In this paper, we present Parallel Hybrid Super Scalar String Sample Sort (pHS5) on Intel HARPv2, a heterogeneous CPU-FPGA system with a server-grade multi-core CPU. Our pHS5 is based on the state-of-the-art string sorting algorithm for multi-core shared memory CPUs, pS5, which we extended with multiple processing elements (PEs) on the FPGA. Each PE accelerates one instance of the most effectively parallelizable dominant kernel of pS5 by up to 33% compared to a single Intel Xeon Broadwell core running at 3.4 GHz. Furthermore, we extended the job scheduling mechanism of pS5 to enable our PEs to compete with the CPU cores for processing the accelerable kernel, while retaining the complex high-level control flow and the sorting of the smaller data sets on the CPU. We accelerate the whole algorithm by up to 10% compared to the 28 thread software baseline running on the 14-core Xeon processor and by up to 36% at lower thread counts.\",\"PeriodicalId\":375691,\"journal\":{\"name\":\"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP49362.2020.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP49362.2020.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

字符串排序是数据库和MapReduce应用程序的重要组成部分;然而,它还没有像定长键排序那样得到广泛的研究。在硬件中处理可变长度的键是具有挑战性的,因此在FPGA上没有提出字符串排序器也就不足为奇了。本文在Intel HARPv2上实现了并行混合标量字符串样本排序(pHS5), HARPv2是一种异构CPU- fpga系统,具有服务器级多核CPU。我们的pHS5基于用于多核共享内存cpu的最先进的字符串排序算法pS5,我们在FPGA上扩展了多个处理元素(pe)。与运行在3.4 GHz的单个Intel至强Broadwell内核相比,每个PE可将pS5最有效并行化主导内核的一个实例加速高达33%。此外,我们扩展了pS5的作业调度机制,使我们的pe能够与CPU内核竞争处理可加速内核,同时保留了CPU上复杂的高级控制流和较小数据集的排序。与在14核至强处理器上运行的28线程软件基准相比,我们将整个算法的速度提高了10%,在线程数较低的情况下,速度提高了36%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort
String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise that no string sorters on FPGA have been proposed yet. In this paper, we present Parallel Hybrid Super Scalar String Sample Sort (pHS5) on Intel HARPv2, a heterogeneous CPU-FPGA system with a server-grade multi-core CPU. Our pHS5 is based on the state-of-the-art string sorting algorithm for multi-core shared memory CPUs, pS5, which we extended with multiple processing elements (PEs) on the FPGA. Each PE accelerates one instance of the most effectively parallelizable dominant kernel of pS5 by up to 33% compared to a single Intel Xeon Broadwell core running at 3.4 GHz. Furthermore, we extended the job scheduling mechanism of pS5 to enable our PEs to compete with the CPU cores for processing the accelerable kernel, while retaining the complex high-level control flow and the sorting of the smaller data sets on the CPU. We accelerate the whole algorithm by up to 10% compared to the 28 thread software baseline running on the 14-core Xeon processor and by up to 36% at lower thread counts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信