Jan Böttcher, Viktor Leis, Jana Giceva, Thomas Neumann, A. Kemper
{"title":"Scalable and robust latches for database systems","authors":"Jan Böttcher, Viktor Leis, Jana Giceva, Thomas Neumann, A. Kemper","doi":"10.1145/3399666.3399908","DOIUrl":"https://doi.org/10.1145/3399666.3399908","url":null,"abstract":"Multi-core scalability is one of the most important features for database systems running on today's hardware. Not surprisingly, the implementation of locks is paramount to achieving efficient and scalable synchronization. In this work, we identify the key database-specific requirements for lock implementations and evaluate them using both micro-benchmarks and full-fledged database workloads. The results indicate that optimistic locking has superior performance in most workloads due to its minimal overhead and latency. By complementing optimistic locking with a pessimistic shared mode lock we demonstrate that we can also process HTAP workloads efficiently. Finally, we show how lock contention can be handled gracefully without slowing down the uncontented fast path or increasing space requirements by using a lightweight parking lot infrastructure.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"91 36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128814120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Robert Lasch, S. Demirsoy, Norman May, V. Ramamurthy, Christian Färber, K. Sattler
{"title":"Accelerating re-pair compression using FPGAs","authors":"Robert Lasch, S. Demirsoy, Norman May, V. Ramamurthy, Christian Färber, K. Sattler","doi":"10.1145/3399666.3399931","DOIUrl":"https://doi.org/10.1145/3399666.3399931","url":null,"abstract":"Re-Pair is a compression algorithm well-suited for applications that require random accesses to compressed data, but has not found widespread use in the data management community due to its prohibitively high compression times. As Re-Pair is a computationally expensive algorithm and FPGAs are becoming more and more common to accelerate such problems in data centers, we devise an FPGA system that performs Re-Pair compression. The system is implemented in OpenCL, aside from a hash table and sorting component realized in RTL for more control over the synthesized hardware. Our experiments demonstrate that an Intel Arria® 10 GX FPGA with our system compresses an order of magnitude faster than a highly-optimized CPU version of Re-Pair. We discuss further optimization opportunities and argue that our system can scale to being deployed on a more resourceful FPGA for even better performance.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124868699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variable word length word-aligned hybrid compression","authors":"Florian Grieskamp, Roland Kühn, J. Teubner","doi":"10.1145/3399666.3399935","DOIUrl":"https://doi.org/10.1145/3399666.3399935","url":null,"abstract":"The Word-Aligned Hybrid (WAH) compression is a prominent example of a lightweight compression scheme for bitmap indices that considers the word size of the underlying architecture. This is a compromise toward commodity CPUs, where operations below the word granularity perform poorly. With the emergence of novel hardware classes, such compromises may no longer be appropriate. Field-programmable gate arrays (FPGAs) do not even have any meaningful \"word size\". In this work, we reconsider strategies for bitmap compression in the light of modern hardware architectures. Rather than tuning compression toward a fixed word size, we propose to tune the word size toward optimal compression. The resulting compression scheme, Variable Word Length Word-Aligned Hybrid (VWLWAH), improves compression rates by almost 75% while maintaining line rate performance on FPGAs.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129427329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hawon Chu, Seounghyun Kim, Joo-Young Lee, Young-Kyoon Suh
{"title":"Empirical evaluation across multiple GPU-accelerated DBMSes","authors":"Hawon Chu, Seounghyun Kim, Joo-Young Lee, Young-Kyoon Suh","doi":"10.1145/3399666.3399907","DOIUrl":"https://doi.org/10.1145/3399666.3399907","url":null,"abstract":"In this paper we conduct an empirical study across modern GPU-accelerated DBMSes with TPC-H workloads. Our rigorous experiments demonstrate that the studied DBMSes appear to utilize GPU resource effectively but do not scale well with growing databases nor have full capability to process some complex analytical queries. Thus, we claim that the GPU DBMSes still need to be further engineered to achieve a better analytical performance.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121990275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahmoud Mohsen, Norman May, Christian Färber, David Broneske
{"title":"FPGA-Accelerated compression of integer vectors","authors":"Mahmoud Mohsen, Norman May, Christian Färber, David Broneske","doi":"10.1145/3399666.3399932","DOIUrl":"https://doi.org/10.1145/3399666.3399932","url":null,"abstract":"An efficient compression of integer vectors is critical in dictionary-encoded column stores like SAP HANA to keep more data in the limited and precious main memory. Past research focused on lightweight compression techniques that trade low latency of data accesses for lower compression ratios. Consequently, only few columns in a wide table benefit from light-weight and effective compression schemes like run-length encoding, prefix compression or sparse encoding. Besides bit-packing, other columns remained uncompressed, which clearly misses opportunities for a better compression ratio for many columns. Furthermore, the main executor for compression was the CPU as compression involves heavy data transfer. Especially when used with co-processors, the data transfer overhead wipes out performance gains from co-processor usage. In this paper, we investigate whether we can achieve good compression ratios even for previously uncompressed columns by using binary packing and prefix suppression offloaded to an FPGA. As a streaming-processor, an FPGA is the perfect candidate to outsource the compression task. As a result of our OpenCL-based implementation, we achieve a saturation of the available PCIe bus during compression on the FPGA, by using less than a third the FPGA's resources. Furthermore, our real-world experiments against CPU-based SAP HANA shows a performance improvement of around a factor of 2 in compression throughput while compressing the data down to 60% of the best SAP HANA compression technique.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133595610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient generation of machine code for query compilers","authors":"Henning Funke, J. Mühlig, J. Teubner","doi":"10.1145/3399666.3399925","DOIUrl":"https://doi.org/10.1145/3399666.3399925","url":null,"abstract":"Query compilation can make query execution extremely efficient, but it introduces additional compilation time. The compilation time causes a relatively high overhead especially for short-running and high-complexity queries. We propose Flounder IR as a lightweight intermediate representation for query compilation to reduce compilation times. Flounder IR is close to machine assembly and adds just that set of features that is necessary for efficient query compilation: virtual registers and function calls ease the construction of the compiler front-end; database-specific extensions enable efficient pipelining in query plans; more elaborate IR features are intentionally left out to maximize compilation speed. In this paper, we present the Flounder IR language and motivate its design; we show how the language makes query compilation intuitive and efficient; and we demonstrate with benchmarks how our Flounder library can significantly reduce query compilation times.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116233621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehdi Moghaddamfar, Christian Färber, Wolfgang Lehner, Norman May
{"title":"Comparative analysis of OpenCL and RTL for sort-merge primitives on FPGA","authors":"Mehdi Moghaddamfar, Christian Färber, Wolfgang Lehner, Norman May","doi":"10.1145/3399666.3399897","DOIUrl":"https://doi.org/10.1145/3399666.3399897","url":null,"abstract":"As a result of recent improvements in FPGA technology, their benefits for highly efficient data processing pipelines are becoming more and more apparent. However, traditional RTL methods for programming FPGAs require knowledge of digital design and hardware description languages. OpenCL™ provides software developers with a C-based platform for implementing their applications without deep knowledge of digital design. In this paper, we conduct a comparative analysis of OpenCL and RTL-based implementations of a novel heapsort with merging sorted runs. In particular, we quantitatively compare their performance, FPGA resource utilization, and development effort. Our results show that while requiring comparable development effort, RTL implementations of critical primitives used in the algorithm achieve 4X better performance while using half as much the FPGA resources.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, Andreas Koch
{"title":"nKV","authors":"Tobias Vinçon, Arthur Bernhardt, Ilia Petrov, Lukas Weber, Andreas Koch","doi":"10.1145/3399666.3399934","DOIUrl":"https://doi.org/10.1145/3399666.3399934","url":null,"abstract":"Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use. In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on executing traditional KV operations (GET, SCAN) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4X-2.7X better performance on real hardware - the COSMOS+ platform [22].","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116519472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Let's add transactions to FPGA-based key-value stores!","authors":"Z. István","doi":"10.1145/3399666.3399909","DOIUrl":"https://doi.org/10.1145/3399666.3399909","url":null,"abstract":"In recent years we have seen a proliferation of FPGA-based key value stores (KVSs) [1--3, 5--7, 10] driven by the need for more efficient large-scale data management and storage solutions. In this context, FPGAs are useful because they offer network-bound performance even with small key-value pairs and near-data processing in a fraction of the energy budget of regular servers. Even though the first FPGA-based key-value stores started appearing already in 2013 and have evolved significantly in the meantime, almost no attention has been paid to offering transactions. Today, however, that such systems are becoming increasingly practical, we need to ensure consistency guarantees for concurrent clients (transactions). This position paper makes the case that adding transaction support is not particularly expensive, compared to other parts of these systems, and in the future all FPGA-based KVSs should provide some form of transactional guarantees. In the remaining of this paper we present a high level view of the typical pipelined architecture of FPGA-based KVSs that most existing designs follow, and show three different ways of implementing transactions, with increasing sophistication: from operation batching, through two phase locking (2PL), to a simplified snapshot isolation model.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123190470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Noll, J. Teubner, Norman May, Alexander Böhm
{"title":"Analyzing memory accesses with modern processors","authors":"Stefan Noll, J. Teubner, Norman May, Alexander Böhm","doi":"10.1145/3399666.3399896","DOIUrl":"https://doi.org/10.1145/3399666.3399896","url":null,"abstract":"Debugging and tuning database systems is very challenging. Using common profiling tools is often not sufficient because they identify the machine instruction rather than the instance of a data structure that causes a performance problem. This leaves a problem's root cause such as memory hotspots or poor data layouts hidden. The state-of-the-art solution is to augment classical profiling with a memory trace. However, current approaches for collecting memory traces are not usable in practice due to their large runtime overhead. In this work, we leverage a mechanism available in modern processors to collect memory traces via hardware-based sampling. We evaluate our approach using a commercial and an open-source database system running the JCC-H benchmark. In particular, we demonstrate that our approach is practical due to its low runtime overhead and we illustrate how memory traces uncover new insights into the memory access characteristics of database systems.","PeriodicalId":256784,"journal":{"name":"Proceedings of the 16th International Workshop on Data Management on New Hardware","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123325720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}