Mahmoud Mohsen, Norman May, Christian Färber, David Broneske
{"title":"FPGA-Accelerated compression of integer vectors","authors":"Mahmoud Mohsen, Norman May, Christian Färber, David Broneske","doi":"10.1145/3399666.3399932","DOIUrl":null,"url":null,"abstract":"An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3399666.3399932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.