Proceedings of the 17th International Workshop on Data Management on New Hardware最新文献_第2页

Hardware-Conscious Sliding Window Aggregation on GPUs gpu上基于硬件的滑动窗口聚合

Proceedings of the 17th International Workshop on Data Management on New Hardware Pub Date : 2021-06-20 DOI: 10.1145/3465998.3466014

G. Michas, Periklis Chrysogelos, Ioannis Mytilinis, A. Ailamaki

{"title":"Hardware-Conscious Sliding Window Aggregation on GPUs","authors":"G. Michas, Periklis Chrysogelos, Ioannis Mytilinis, A. Ailamaki","doi":"10.1145/3465998.3466014","DOIUrl":"https://doi.org/10.1145/3465998.3466014","url":null,"abstract":"Stream Processing Engines (SPEs) have recently begun utilizing heterogeneous coprocessors (e.g., GPUs) to meet the velocity requirements of modern real-time applications. The massive parallelism and high memory bandwidth of GPUs can significantly increase processing throughput in data-intensive streaming scenarios, such as windowed aggregations. However, previous research only focused on the overall architecture of hybrid CPU-GPU streaming systems and the need for efficient in-GPU window operators was overshadowed by the limited interconnect bandwidth. With aggregation taking up a significant portion of streaming workloads, in this work, we analyze and optimize the performance of sliding window aggregates over GPUs. Current implementations under-utilize the hardware, and for a range of query parameters they cannot even saturate the bandwidth of the interconnect. To optimize execution, we first evaluate the fundamental building blocks of streaming aggregation for GPUs and identify the performance bottlenecks. Then, we build Slider: an adaptive algorithm that selects the most appropriate primitives and kernel configurations based on the query parameters. Our evaluation shows that Slider outperforms previous approaches by 3×-1250×, and saturates both the interconnect and the memory bandwidth for a wide range of examined input workloads.","PeriodicalId":183683,"journal":{"name":"Proceedings of the 17th International Workshop on Data Management on New Hardware","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121507222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Reducing Bloom Filter CPU Overhead in LSM-Trees on Modern Storage Devices 减少现代存储设备上lsm树中的Bloom Filter CPU开销

Proceedings of the 17th International Workshop on Data Management on New Hardware Pub Date : 2021-06-20 DOI: 10.1145/3465998.3466002

Zichen Zhu, J. Mun, Aneesh Raman, Manos Athanassoulis

{"title":"Reducing Bloom Filter CPU Overhead in LSM-Trees on Modern Storage Devices","authors":"Zichen Zhu, J. Mun, Aneesh Raman, Manos Athanassoulis","doi":"10.1145/3465998.3466002","DOIUrl":"https://doi.org/10.1145/3465998.3466002","url":null,"abstract":"Bloom filters (BFs) accelerate point lookups in Log-Structured Merge (LSM) trees by reducing unnecessary storage accesses to levels that do not contain the desired key. BFs are particularly beneficial when there is a significant performance difference between querying a BF (hashing and accessing memory) and accessing data (on secondary storage). This gap, however, is decreasing as modern storage devices (SSDs and NVMs) have increasingly lower latency, to the point that the cost of accessing data can be comparable to that of filter probing and hashing, especially for large key sizes that exhibit high hashing cost. In an LSM-tree, BFs are employed when querying each level of the tree, thus, exacerbating the CPU cost as the data size - and thus, the tree height - grows. To address the increasing CPU cost of BFs in LSM-trees, we propose to re-use hash calculations aggressively within and across BFs, as well as between different levels, and we show both analytically and experimentally that we can maintain a close-to-ideal false positive rate while significantly reducing the runtime. The reduced CPU cost for queries using the proposed hash sharing leads to 10% higher lookup performance in an LSM-tree with 22GB of data (5 levels) stored in a state-of-the-art PCIe SSD. The benefit further increases for faster underlying storage. Specifically, we show that for faster NVM devices, hash sharing leads to performance gains up to 40%.","PeriodicalId":183683,"journal":{"name":"Proceedings of the 17th International Workshop on Data Management on New Hardware","volume":"86 16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126283531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

The Case for SIMDified Analytical Query Processing on GPUs 在gpu上简化分析查询处理的案例

Proceedings of the 17th International Workshop on Data Management on New Hardware Pub Date : 2021-06-20 DOI: 10.1145/3465998.3466015

Johannes Fett, A. Ungethüm, Dirk Habich, Wolfgang Lehner

引用次数: 0

A cost model for NDP-aware query optimization for KV-stores 基于ndp感知的KV-stores查询优化成本模型

Proceedings of the 17th International Workshop on Data Management on New Hardware Pub Date : 2021-06-20 DOI: 10.1145/3465998.3466013

Christian Knödler, Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Leonardo Solis-Vasquez, Lukas Weber, Andreas Koch

引用次数: 1