快速轻量级合并排序器

Philippos Papaphilippou, Chris Brooks, W. Luk
{"title":"快速轻量级合并排序器","authors":"Philippos Papaphilippou, Chris Brooks, W. Luk","doi":"10.1109/FPT.2018.00022","DOIUrl":null,"url":null,"abstract":"We have developed a highly-efficient and simple parallel hardware design for merging two sorted lists residing in banked (or multi-ported) memory. The FPGA implementation uses half the hardware resources required for implementing the current state-of-the-art architecture. This is achieved with better performance and half the latency, for the same amount of parallelism. The challenges for the merge operations in FPGAs have been the low clock frequency due to the feedback datapath of the merger being the critical path for timing, and also the high resource utilisation in recent attempts to eliminate/remove the feedback datapath. Our solution uses a modified version of the bitonic merge block, as found in a bitonic sorter, repurposed for performing parallel merge for streaming data. As with the state-of-the-art, it can be considered feedback-less since it only nests one parallel comparison for any desired level of parallelism. This leads to high operating frequency designs, 1.3 times higher than the previous best on our test platform. Since the new design uses 2 times fewer hardware resources, it allows more parallelism and leaves room for additional logic.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"FLiMS: Fast Lightweight Merge Sorter\",\"authors\":\"Philippos Papaphilippou, Chris Brooks, W. Luk\",\"doi\":\"10.1109/FPT.2018.00022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We have developed a highly-efficient and simple parallel hardware design for merging two sorted lists residing in banked (or multi-ported) memory. The FPGA implementation uses half the hardware resources required for implementing the current state-of-the-art architecture. This is achieved with better performance and half the latency, for the same amount of parallelism. The challenges for the merge operations in FPGAs have been the low clock frequency due to the feedback datapath of the merger being the critical path for timing, and also the high resource utilisation in recent attempts to eliminate/remove the feedback datapath. Our solution uses a modified version of the bitonic merge block, as found in a bitonic sorter, repurposed for performing parallel merge for streaming data. As with the state-of-the-art, it can be considered feedback-less since it only nests one parallel comparison for any desired level of parallelism. This leads to high operating frequency designs, 1.3 times higher than the previous best on our test platform. Since the new design uses 2 times fewer hardware resources, it allows more parallelism and leaves room for additional logic.\",\"PeriodicalId\":434541,\"journal\":{\"name\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2018.00022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2018.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

我们已经开发了一个高效和简单的并行硬件设计,用于合并驻留在银行(或多端口)内存中的两个排序列表。FPGA实现使用了实现当前最先进架构所需硬件资源的一半。对于相同数量的并行性,可以实现更好的性能和一半的延迟。fpga中合并操作的挑战是低时钟频率,因为合并的反馈数据路径是定时的关键路径,以及最近试图消除/删除反馈数据路径的高资源利用率。我们的解决方案使用了bitonic合并块的修改版本,可以在bitonic排序器中找到,用于执行流数据的并行合并。与最先进的技术一样,它可以被认为是无反馈的,因为对于任何期望的并行度级别,它只嵌套一个并行比较。这导致了高工作频率设计,比我们测试平台上以前最好的设计高1.3倍。由于新设计使用的硬件资源减少了2倍,它允许更多的并行性,并为额外的逻辑留出空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FLiMS: Fast Lightweight Merge Sorter
We have developed a highly-efficient and simple parallel hardware design for merging two sorted lists residing in banked (or multi-ported) memory. The FPGA implementation uses half the hardware resources required for implementing the current state-of-the-art architecture. This is achieved with better performance and half the latency, for the same amount of parallelism. The challenges for the merge operations in FPGAs have been the low clock frequency due to the feedback datapath of the merger being the critical path for timing, and also the high resource utilisation in recent attempts to eliminate/remove the feedback datapath. Our solution uses a modified version of the bitonic merge block, as found in a bitonic sorter, repurposed for performing parallel merge for streaming data. As with the state-of-the-art, it can be considered feedback-less since it only nests one parallel comparison for any desired level of parallelism. This leads to high operating frequency designs, 1.3 times higher than the previous best on our test platform. Since the new design uses 2 times fewer hardware resources, it allows more parallelism and leaves room for additional logic.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信