FPGA Based Co-design of Storage-side Query Filter for Big Data Systems

Jinyu Zhan, Ying Li, Wei Jiang, Jianping Zhu
{"title":"FPGA Based Co-design of Storage-side Query Filter for Big Data Systems","authors":"Jinyu Zhan, Ying Li, Wei Jiang, Jianping Zhu","doi":"10.1109/socc49529.2020.9524801","DOIUrl":null,"url":null,"abstract":"In this paper we are interested in accelerating the processing of big data systems. We consider the architecture of storage and computing separated Big Data systems, and approach to improve the data query efficiency in the storage side. We propose an Field Programmable Gate Array (FPGA) based co-design of query filter on storage nodes to reduce the workloads of computing nodes and the communication overheads between them. The codesign of query filter is composed of software layer and FPGA layer. In software layer, we use the pointers to project the data in the RCFile format to reduce data transmission, and then formulate the combined predicate of SQL conditions into parameters. In FPGA layer, we design two filtering schemes on FPGA for data in RCFile format, i.e. parallel sequential filter and parallel pipeline filter, by which we can achieve that different columns and SQL queries are completely parallel. Based on TPC-H benchmark and Tencent data set, we conduct extensive experiments to evaluate our design, which can save averagely 76.2% of time overhead compared with Presto and 96.86% of time overhead compared with Hive.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/socc49529.2020.9524801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper we are interested in accelerating the processing of big data systems. We consider the architecture of storage and computing separated Big Data systems, and approach to improve the data query efficiency in the storage side. We propose an Field Programmable Gate Array (FPGA) based co-design of query filter on storage nodes to reduce the workloads of computing nodes and the communication overheads between them. The codesign of query filter is composed of software layer and FPGA layer. In software layer, we use the pointers to project the data in the RCFile format to reduce data transmission, and then formulate the combined predicate of SQL conditions into parameters. In FPGA layer, we design two filtering schemes on FPGA for data in RCFile format, i.e. parallel sequential filter and parallel pipeline filter, by which we can achieve that different columns and SQL queries are completely parallel. Based on TPC-H benchmark and Tencent data set, we conduct extensive experiments to evaluate our design, which can save averagely 76.2% of time overhead compared with Presto and 96.86% of time overhead compared with Hive.
基于FPGA的大数据系统存储侧查询滤波器协同设计
在本文中,我们感兴趣的是加速大数据系统的处理。我们考虑了存储与计算分离的大数据系统架构,以及在存储端提高数据查询效率的方法。为了减少计算节点的工作负荷和通信开销,提出了一种基于现场可编程门阵列(FPGA)的存储节点协同设计查询滤波器。查询滤波器的协同设计由软件层和FPGA层组成。在软件层,我们使用指针将数据以RCFile格式投影,减少数据传输,然后将SQL条件的组合谓词形成参数。在FPGA层,我们在FPGA上设计了两种针对RCFile格式数据的滤波方案,即并行顺序滤波和并行流水线滤波,实现了不同列和SQL查询完全并行。基于TPC-H基准和腾讯数据集,我们进行了大量的实验来评估我们的设计,与Presto相比平均节省76.2%的时间开销,与Hive相比平均节省96.86%的时间开销。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信