{"title":"FPGA Based Co-design of Storage-side Query Filter for Big Data Systems","authors":"Jinyu Zhan, Ying Li, Wei Jiang, Jianping Zhu","doi":"10.1109/socc49529.2020.9524801","DOIUrl":null,"url":null,"abstract":"In this paper we are interested in accelerating the processing of big data systems. We consider the architecture of storage and computing separated Big Data systems, and approach to improve the data query efficiency in the storage side. We propose an Field Programmable Gate Array (FPGA) based co-design of query filter on storage nodes to reduce the workloads of computing nodes and the communication overheads between them. The codesign of query filter is composed of software layer and FPGA layer. In software layer, we use the pointers to project the data in the RCFile format to reduce data transmission, and then formulate the combined predicate of SQL conditions into parameters. In FPGA layer, we design two filtering schemes on FPGA for data in RCFile format, i.e. parallel sequential filter and parallel pipeline filter, by which we can achieve that different columns and SQL queries are completely parallel. Based on TPC-H benchmark and Tencent data set, we conduct extensive experiments to evaluate our design, which can save averagely 76.2% of time overhead compared with Presto and 96.86% of time overhead compared with Hive.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/socc49529.2020.9524801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we are interested in accelerating the processing of big data systems. We consider the architecture of storage and computing separated Big Data systems, and approach to improve the data query efficiency in the storage side. We propose an Field Programmable Gate Array (FPGA) based co-design of query filter on storage nodes to reduce the workloads of computing nodes and the communication overheads between them. The codesign of query filter is composed of software layer and FPGA layer. In software layer, we use the pointers to project the data in the RCFile format to reduce data transmission, and then formulate the combined predicate of SQL conditions into parameters. In FPGA layer, we design two filtering schemes on FPGA for data in RCFile format, i.e. parallel sequential filter and parallel pipeline filter, by which we can achieve that different columns and SQL queries are completely parallel. Based on TPC-H benchmark and Tencent data set, we conduct extensive experiments to evaluate our design, which can save averagely 76.2% of time overhead compared with Presto and 96.86% of time overhead compared with Hive.