M. Saxena, Mehul A. Shah, S. Harizopoulos, M. Swift, A. Merchant
{"title":"Hathi: durable transactions for memory using flash","authors":"M. Saxena, Mehul A. Shah, S. Harizopoulos, M. Swift, A. Merchant","doi":"10.1145/2236584.2236589","DOIUrl":"https://doi.org/10.1145/2236584.2236589","url":null,"abstract":"Recent architectural trends---cheap, fast solid-state storage, inexpensive DRAM, and multi-core CPUs---provide an opportunity to rethink the interface between applications and persistent storage. To leverage these advances, we propose a new system architecture called Hathi that provides an in-memory transactional heap made persistent using high-speed flash drives. With Hathi, programmers can make consistent concurrent updates to in-memory data structures that survive system failures.\u0000 Hathi focuses on three major design goals: ACID semantics, a simple programming interface, and fine-grained programmer control. Hathi relies on software transactional memory to provide a simple concurrent interface to in-memory data structures, and extends it with persistent logs and checkpoints to add durability.\u0000 To reduce the cost of durability, Hathi uses two main techniques. First, it provides split-phase and partitioned commit interfaces, that allow programmers to overlap commit I/O with computation and to avoid unnecessary synchronization. Second, it uses partitioned logging, which reduces contention on in-memory log buffers and exploits internal SSD parallelism. We find that our implementation of Hathi can achieve 1.25 million txns/s with a single SSD.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126814109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner
{"title":"KISS-Tree: smart latch-free in-memory indexing on modern architectures","authors":"T. Kissinger, B. Schlegel, Dirk Habich, Wolfgang Lehner","doi":"10.1145/2236584.2236587","DOIUrl":"https://doi.org/10.1145/2236584.2236587","url":null,"abstract":"Growing main memory capacities and an increasing number of hardware threads in modern server systems led to fundamental changes in database architectures. Most importantly, query processing is nowadays performed on data that is often completely stored in main memory. Despite of a high main memory scan performance, index structures are still important components, but they have to be designed from scratch to cope with the specific characteristics of main memory and to exploit the high degree of parallelism. Current research mainly focused on adapting block-optimized B+-Trees, but these data structures were designed for secondary memory and involve comprehensive structural maintenance for updates.\u0000 In this paper, we present the KISS-Tree, a latch-free in-memory index that is optimized for a minimum number of memory accesses and a high number of concurrent updates. More specifically, we aim for the same performance as modern hash-based algorithms but keeping the order-preserving nature of trees. We achieve this by using a prefix tree that incorporates virtual memory management functionality and compression schemes. In our experiments, we evaluate the KISS-Tree on different workloads and hardware platforms and compare the results to existing in-memory indexes. The KISS-Tree offers the highest reported read performance on current architectures, a balanced read/write performance, and has a low memory footprint.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"148 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134323529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GiST scan acceleration using coprocessors","authors":"F. Beier, T. Kilias, K. Sattler","doi":"10.1145/2236584.2236593","DOIUrl":"https://doi.org/10.1145/2236584.2236593","url":null,"abstract":"Efficient lookups in huge, possibly multi-dimensional datasets are crucial for the performance of numerous use cases that generate multiple search operations at the same time, like point queries in ray tracing or spatial joins in collision detection of interactive 3D applications. These applications greatly benefit from index structures that quickly filter relevant candidates for further processing. Since different lookup operations are independent from each other, they might be processed in parallel on modern hardware like multi-core CPUs or GPUs. But implementing efficient algorithms for all kinds of indexes on various hardware platforms is a challenging task. In this paper, we present a new approach that extends the existing GiST index framework with an abstraction layer for the hardware where index operations are executed. Furthermore, we provide first performance evaluations for the scan execution on CPUs and an Nvidia Tesla GPU.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128871135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Islam Atta, Pınar Tözün, A. Ailamaki, Andreas Moshovos
{"title":"Reducing OLTP instruction misses with thread migration","authors":"Islam Atta, Pınar Tözün, A. Ailamaki, Andreas Moshovos","doi":"10.1145/2236584.2236586","DOIUrl":"https://doi.org/10.1145/2236584.2236586","url":null,"abstract":"During an instruction miss a processor is unable to fetch instructions. The more frequent instruction misses are the less able a modern processor is to find useful work to do and thus performance suffers. Online transaction processing (OLTP) suffers from high instruction miss rates since the instruction footprint of OLTP transactions does not fit in today's L1-I caches. However, modern many-core chips have ample aggregate L1 cache capacity across multiple cores. Looking at the code paths concurrently executing transactions follow, we observe a high degree of repetition both within and across transactions. This work presents TMi a technique that uses thread migration to reduce instruction misses by spreading the footprint of a transaction over multiple L1 caches. TMi is a software-transparent, hardware technique; TMi requires no code instrumentation, and efficiently utilizes available cache capacity. This work evaluates TMi's potential and shows that it may reduce instruction misses by 51% on average. This work discusses the underlying tradeoffs and challenges, such as an increase in data misses, and points to potential solutions.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129074347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Darius Sidlauskas, Christian S. Jensen, Simonas Šaltenis
{"title":"A comparison of the use of virtual versus physical snapshots for supporting update-intensive workloads","authors":"Darius Sidlauskas, Christian S. Jensen, Simonas Šaltenis","doi":"10.1145/2236584.2236585","DOIUrl":"https://doi.org/10.1145/2236584.2236585","url":null,"abstract":"Deployments of networked sensors fuel online applications that feed on real-time sensor data. This scenario calls for techniques that support the management of workloads that contain queries as well as very frequent updates. This paper compares two well-chosen approaches to exploiting the parallelism offered by modern processors for supporting such workloads. A general approach to avoiding contention among parallel hardware threads and thus exploiting the parallelism available in processors is to maintain two copies, or snapshots, of the data: one for the relatively long-duration queries and one for the frequent and very localized updates. The snapshot that receives the updates is frequently made available to queries, so that queries see up-to-date data. The snapshots may be physical or virtual. Physical snapshots are created using the C library memcpy function. Virtual snapshots are created by the fork system function that creates a new process that initially has the same data snapshot as the process it was forked from. When the new process carries out updates, this triggers the actual memory copying in a copy-on-write manner at memory page granularity. This paper characterizes the circumstances under which each technique is preferable. The use of physical snapshots is surprisingly efficient.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126379535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"X-device query processing by bitwise distribution","authors":"H. Pirk, Thibault Sellam, S. Manegold, M. Kersten","doi":"10.1145/2236584.2236591","DOIUrl":"https://doi.org/10.1145/2236584.2236591","url":null,"abstract":"The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For example, existing approaches to CPU/GPU co-processing distribute individual relational operators to the \"most appropriate\" device. While pleasantly simple, this strategy has a number of problems: it may leave the \"inappropriate\" devices idle while overloading the \"appropriate\" device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by partially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126389004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A case for micro-cellstores: energy-efficient data management on recycled smartphones","authors":"S. Harizopoulos, S. Papadimitriou","doi":"10.1145/1995441.1995448","DOIUrl":"https://doi.org/10.1145/1995441.1995448","url":null,"abstract":"Increased energy costs and concerns for sustainability make the following question more relevant than ever: can we turn old or unused computing equipment into cost- and energy-efficient modules that can be readily repurposed? We believe the answer is yes, and our proposal is to turn unused smartphones into micro-data center composable modules. In this paper, we introduce the concept of a Micro-Cellstore (MCS), a stand-alone data-appliance housing dozens of recycled smartphones. Through detailed power and performance measurements on a Linux-based current-generation smartphone, we assess the potential of MCSs as a data management platform. In this paper we focus on scan-based partitionable workloads. We show that smartphones are overall more energy efficient than recently proposed low-power alternatives, based on an initial evaluation over a wide range of single-node database scan workloads, and that the gains become more significant when operating on narrow tuples (i.e., column-stores, or compressed row-stores). Our initial results are very encouraging, showing efficiency gains of up to 6×, and indicate several promising future directions.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128350859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bishwaranjan Bhattacharjee, K. A. Ross, Christian A. Lang, G. Mihaila, M. Banikazemi
{"title":"Enhancing recovery using an SSD buffer pool extension","authors":"Bishwaranjan Bhattacharjee, K. A. Ross, Christian A. Lang, G. Mihaila, M. Banikazemi","doi":"10.1145/1995441.1995443","DOIUrl":"https://doi.org/10.1145/1995441.1995443","url":null,"abstract":"Recent advances in solid state technology have led to the introduction of solid state drives (SSDs). Today's SSDs store data persistently using NAND flash memory and support good random IO performance. Current work in exploiting flash in database systems has primarily focused on using its random IO capability for second level bufferpools below main memory. There has not been much emphasis on exploiting its persistence.\u0000 In this paper, we describe a mechanism extending our previous work on a SSD Bufferpool on a DB2 LUW prototype, to exploit the SSD persistence for recovery and normal restart. We demonstrate significantly shorter recovery times, and improved performance immediately after recovery completes. We quantify the overhead of supporting recovery and show that the overhead is minimal.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"126 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126281495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to efficiently snapshot transactional data: hardware or software controlled?","authors":"Henrik Mühe, A. Kemper, Thomas Neumann","doi":"10.1145/1995441.1995444","DOIUrl":"https://doi.org/10.1145/1995441.1995444","url":null,"abstract":"The quest for real-time business intelligence requires executing mixed transaction and query processing workloads on the same current database state. However, as Harizopoulos et al. [6] showed for transactional processing, co-execution using classical concurrency control techniques will not yield the necessary performance -- even in re-emerging main memory database systems. Therefore, we designed an in-memory database system that separates transaction processing from OLAP query processing via periodically refreshed snapshots. Thus, OLAP queries can be executed without any synchronization and OLTP transaction processing follows the lock-free, mostly serial processing paradigm of H-Store [8]. In this paper, we analyze different snapshot mechanisms: Hardware-supported Page Shadowing, which lazily copies memory pages when changed by transactions, software controlled Tuple Shadowing, which generates a new version when a tuple is modified, software controlled Twin Tuple, which constantly maintains two versions of each tuple and HotCold Shadowing, which effectively combines Tuple Shadowing and hardware-supported Page Shadowing by clustering update-intensive objects. We evaluate their performance based on the mixed workload CH-BenCHmark which combines the TPC-C and the TPC-H benchmarks on the same database schema and state.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116283573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable aggregation on multicore processors","authors":"Yangang Ye, K. A. Ross, Norases Vesdapunt","doi":"10.1145/1995441.1995442","DOIUrl":"https://doi.org/10.1145/1995441.1995442","url":null,"abstract":"In data-intensive and multi-threaded programming, the performance bottleneck has shifted from I/O bandwidth to main memory bandwidth. The availability, size, and other properties of on-chip cache strongly influence performance. A key question is whether to allow different threads to work independently, or whether to coordinate the shared workload among the threads. The independent approach avoids synchronization overhead, but requires resources proportional to the number of threads and thus is not scalable. On the other hand, the shared method suffers from coordination overhead and potential contention.\u0000 In this paper, we aim to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture. We consider several previously proposed techniques that were evaluated on other architectures, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected. We also propose two algorithms: partition-and-aggregate and PLAT. The PLAT and hybrid methods perform best overall, utilizing the computational power of multiple threads without needing memory proportional to the number of threads, and avoiding much of the coordination overhead and contention apparent in the shared table method.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115564088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}