Terabyte Sort on FPGA-Accelerated Flash Storage

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI:10.1109/FCCM.2017.53

S. Jun, Shuotao Xu, Arvind

{"title":"Terabyte Sort on FPGA-Accelerated Flash Storage","authors":"S. Jun, Shuotao Xu, Arvind","doi":"10.1109/FCCM.2017.53","DOIUrl":null,"url":null,"abstract":"Sorting is one of the most fundamental and usefulapplications in computer science, and continues to be animportant tool in analyzing large datasets. An important andchallenging subclass of sorting problems involves sorting terabytescale datasets with hundreds of billions of records. Theconventional method of sorting such large amounts of datais to distribute the data and computation over a cluster ofmachines. Such solutions can be fast but are often expensiveand power-hungry. In this paper, we propose a solution basedon flash storage connected to a collection of FPGA-based sortingaccelerators that perform large-scale merge-sort in storage. Theaccelerators include highly efficient sorting networks and mergetrees that use bitonic sorting to emit multiple sorted valuesevery cycle. We show that by appropriate use of acceleratorswe can remove all the computation bottlenecks so that the endto-endsorting performance is limited only by the flash storagebandwidth. We demonstrate that our flash-based system matchesthe performance of existing distributed-cluster solutions of muchlarger scale. More importantly, our prototype is able to showalmost twice the power efficiency compared to the existingJoulesort record holder. An optimized system with less wastefulcomponents is projected to be four times more efficient comparedto the current record holder, sorting over 200,000 records perjoule of energy.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

Sorting is one of the most fundamental and usefulapplications in computer science, and continues to be animportant tool in analyzing large datasets. An important andchallenging subclass of sorting problems involves sorting terabytescale datasets with hundreds of billions of records. Theconventional method of sorting such large amounts of datais to distribute the data and computation over a cluster ofmachines. Such solutions can be fast but are often expensiveand power-hungry. In this paper, we propose a solution basedon flash storage connected to a collection of FPGA-based sortingaccelerators that perform large-scale merge-sort in storage. Theaccelerators include highly efficient sorting networks and mergetrees that use bitonic sorting to emit multiple sorted valuesevery cycle. We show that by appropriate use of acceleratorswe can remove all the computation bottlenecks so that the endto-endsorting performance is limited only by the flash storagebandwidth. We demonstrate that our flash-based system matchesthe performance of existing distributed-cluster solutions of muchlarger scale. More importantly, our prototype is able to showalmost twice the power efficiency compared to the existingJoulesort record holder. An optimized system with less wastefulcomponents is projected to be four times more efficient comparedto the current record holder, sorting over 200,000 records perjoule of energy.

查看原文本刊更多论文

fpga加速闪存上的tb排序

排序是计算机科学中最基本和最有用的应用之一，并且仍然是分析大型数据集的重要工具。排序问题的一个重要且具有挑战性的子类涉及排序具有数千亿条记录的tb级数据集。对如此大量的数据进行排序的传统方法是将数据和计算分布在一组机器上。这样的解决方案可以快速，但往往是昂贵和耗电。在本文中，我们提出了一种基于闪存的解决方案，该解决方案连接到一组基于fpga的排序加速器，可以在存储中执行大规模的合并排序。加速器包括高效的排序网络和合并树，它们使用双元排序在每个循环中发出多个排序值。我们表明，通过适当使用加速器，我们可以消除所有的计算瓶颈，使端到端排序性能仅受闪存存储带宽的限制。我们证明，我们基于闪存的系统匹配现有的更大规模的分布式集群解决方案的性能。更重要的是，我们的原型能够显示出比现有的jouesort记录保持者几乎两倍的功率效率。一个具有更少浪费组件的优化系统预计将比目前的记录保持者效率提高四倍，每焦耳能量分类超过20万条记录。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量