Mehul A. Shah, S. Harizopoulos, J. Wiener, G. Graefe
{"title":"Fast scans and joins using flash drives","authors":"Mehul A. Shah, S. Harizopoulos, J. Wiener, G. Graefe","doi":"10.1145/1457150.1457154","DOIUrl":"https://doi.org/10.1145/1457150.1457154","url":null,"abstract":"As access times to main memory and disks continue to diverge, faster non-volatile storage technologies become more attractive for speeding up data analysis applications. NAND flash is one such promising substitute for disks. Flash offers faster random reads than disk, consumes less power than disk, and is cheaper than DRAM. In this paper, we investigate alternative data layouts and join algorithms suited for systems that use flash drives as the non-volatile store.\u0000 All of our techniques take advantage of the fast random reads of flash. We convert traditional sequential I/O algorithms to ones that use a mixture of sequential and random I/O to process less data in less time. Our measurements on commodity flash drives show that a column-major layout of data pages is faster than a traditional row-based layout for simple scans. We present a new join algorithm, RARE-join, designed for a column-based page layout on flash and compare it to a traditional hash join algorithm. Our analysis shows that RARE-join is superior in many practical cases: when join selectivities are small and only a few columns are projected in the join result.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129417738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Avoiding version redundancy for high performance reads in temporal databases","authors":"Khaled Jouini, G. Jomier","doi":"10.1145/1457150.1457159","DOIUrl":"https://doi.org/10.1145/1457150.1457159","url":null,"abstract":"A major performance bottleneck for database systems is the memory hierarchy. The performance of the memory hierarchy is directly related to how the content of disk pages maps to the L2 cache lines, i.e. to the organization of data within a disk page, called the page layout. The prevalent page layout in database systems is the N-ary Storage Model (NSM). As demonstrated in this paper, using NSM for temporal data deteriorates memory hierarchy performance for query-intensive workloads. This paper proposes two cacheconscious, read-optimized, page layouts for temporal data. Experiments show that the proposed page layouts are substantially faster than NSM.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121632814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing","authors":"M. Zukowski, N. Nes, P. Boncz","doi":"10.1145/1457150.1457160","DOIUrl":"https://doi.org/10.1145/1457150.1457160","url":null,"abstract":"Comparisons between the merits of row-wise storage (NSM) and columnar storage (DSM) are typically made with respect to the persistent storage layer of database systems. In this paper, however, we focus on the CPU efficiency tradeoffs of tuple representations inside the query execution engine, while tuples flow through a processing pipeline. We analyze the performance in the context of query engines using so-called \"block-oriented\" processing -- a recently popularized technique that can strongly improve the CPU efficiency. With this high efficiency, the performance trade-offs between NSM and DSM can have a decisive impact on the query execution performance, as we demonstrate using both microbenchmarks and TPC-H query 1. This means that NSM-based database systems can sometimes benefit from converting tuples into DSM on-the-fly, and vice versa.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"366 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127069542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling the performance of algorithms on flash memory devices","authors":"K. A. Ross","doi":"10.1145/1457150.1457153","DOIUrl":"https://doi.org/10.1145/1457150.1457153","url":null,"abstract":"NAND flash memory is fast becoming popular as a component of large scale storage devices. For workloads requiring many random I/Os, flash devices can provide two orders of magnitude increased performance relative to magnetic disks. Flash memory has some unusual characteristics. In particular, general updates require a page write, while updates of 1 bits to 0 bits can be done in-place. In order to measure how well algorithms perform on such a device, we propose the \"EWOM\" model for analyzing algorithms on flash memory devices. We introduce flash-aware algorithms for counting, listmanagement, and B-trees, and analyze them using the EWOM model. This analysis shows that one can use the incremental 1-to-0 update properties of flash memory in interesting ways to reduce the required number of page-write operations.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"252 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116480717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CAM conscious integrated answering of frequent elements and top-k queries over data streams","authors":"Sudipto Das, D. Agrawal, A. E. Abbadi","doi":"10.1145/1457150.1457152","DOIUrl":"https://doi.org/10.1145/1457150.1457152","url":null,"abstract":"Frequent elements and top-k queries constitute an important class of queries for data stream analysis applications. Certain applications require answers for both frequent elements and top-k queries on the same stream. In addition, the ever increasing data rates call for providing fast answers to the queries, and researchers have been looking towards exploiting specialized hardware for this purpose. Content Addressable Memory(CAM) provides an efficient way of looking up elements and hence are well suited for the class of algorithms that involve lookups. In this paper, we present a fast and efficient CAM conscious integrated solution for answering both frequent elements and top-k queries on the same stream. We call our scheme CAM conscious Space Saving with Stream Summary (CSSwSS), and it can efficiently answer continuous queries. We provide an implementation of the proposed scheme using commodity CAM chips, and the experimental evaluation demonstrates that not only does the proposed scheme outperforms existing CAM conscious techniques by an order of magnitude at query loads of about 10%, but the proposed scheme can also efficiently answer continuous queries.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126962346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A general framework for improving query processing performance on multi-level memory hierarchies","authors":"Bingsheng He, Yinan Li, Qiong Luo, Dongqing Yang","doi":"10.1145/1363189.1363193","DOIUrl":"https://doi.org/10.1145/1363189.1363193","url":null,"abstract":"We propose a general framework for improving the query processing performance on multi-level memory hierarchies. Our motivation is that (1) the memory hierarchy is an important performance factor for query processing, (2) both the memory hierarchy and database systems are becoming increasingly complex and diverse, and (3) increasing the amount of tuning does not always improve the performance. Therefore, we categorize multiple levels of memory performance tuning and quantify their performance impacts. As a case study, we use this framework to improve the in-memory performance of storage models, B+-trees, nested-loop joins and hash joins. Our empirical evaluation verifies the usefulness of the proposed framework.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126404662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Yang, Bingsheng He, Rui Fang, Mian Lu, N. Govindaraju, Qiong Luo, P. Sander, Jiaoying Shi
{"title":"In-memory grid files on graphics processors","authors":"Ke Yang, Bingsheng He, Rui Fang, Mian Lu, N. Govindaraju, Qiong Luo, P. Sander, Jiaoying Shi","doi":"10.1145/1363189.1363196","DOIUrl":"https://doi.org/10.1145/1363189.1363196","url":null,"abstract":"Recently, graphics processing units, or GPUs, have become a viable alternative as commodity, parallel hardware for general-purpose computing, due to their massive data-parallelism, high memory bandwidth, and improved general-purpose programming interface. In this paper, we explore the use of GPU on the grid file, a traditional multidimensional access method. Considering the hardware characteristics of GPUs, we design a massively multi-threaded GPU-based grid file for static, memory-resident multidimensional point data. Moreover, we propose a hierarchical grid file variant to handle data skews efficiently. Our implementations on the NVIDIA G80 GTX graphics card are able to achieve two to eight times' higher performance than their CPU counterparts on a single PC.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115402224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The five-minute rule twenty years later, and how flash memory changes the rules","authors":"G. Graefe","doi":"10.1145/1363189.1363198","DOIUrl":"https://doi.org/10.1145/1363189.1363198","url":null,"abstract":"In 1987, Gray and Putzolo presented the five-minute rule, which was reviewed and renewed ten years later in 1997. With the advent of flash memory in the gap between traditional RAM main memory and traditional disk systems, the five-minute rule now applies to large pages appropriate for today's disks and their fast transfer bandwidths, and it also applies to flash disks holding small pages appropriate for their fast access latency.\u0000 Flash memory fills the gap between RAM and disks in terms of many metrics: acquisition cost, access latency, transfer bandwidth, spatial density, and power consumption. Thus, within a few years, flash memory will likely be used heavily in operating systems, file systems, and database systems. Research into appropriate system architectures is urgently needed.\u0000 The basic software architectures for exploiting flash in these systems are called \"extended buffer pool\" and \"extended disk\" here. Based on the characteristics of these software architectures, an argument is presented why operating systems and file systems on one hand and database systems on the other hand will best benefit from flash memory by employing different software architectures.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131052964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vectorized data processing on the cell broadband engine","authors":"S. Héman, N. Nes, M. Zukowski, P. Boncz","doi":"10.1145/1363189.1363195","DOIUrl":"https://doi.org/10.1145/1363189.1363195","url":null,"abstract":"In this work, we research the suitability of the Cell Broadband Engine for database processing. We start by outlining the main architectural features of Cell and use micro-benchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (ii) all performance-critical branches need to be eliminated, (iii) a very small and hard limit on program code size should be respected.\u0000 While we argue that conventional database implementations, i.e. row-stores with Volcano-style tuple pipelining, are a hard fit to Cell, it turns out that the three challenges are quite easily met in databases that use column-wise processing. We managed to implement a proof-of-concept port of the vectorized query processing model of MonetDB/X100 on Cell by running the operator pipeline on the PowerPC, but having it execute the vectorized primitives (data parallel) on its SPE cores. A performance evaluation on TPC-H Q1 shows that vectorized query processing on Cell can beat conventional PowerPC and Itanium2 CPUs by a factor 20.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132879088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel buffers for chip multiprocessors","authors":"J. Cieslewicz, K. A. Ross, Ioannis Giannakakis","doi":"10.1145/1363189.1363192","DOIUrl":"https://doi.org/10.1145/1363189.1363192","url":null,"abstract":"Chip multiprocessors (CMPs) present new opportunities for improving database performance on large queries. Because CMPs often share execution, cache, or bandwidth resources among many hardware threads, implementing parallel database operators that efficiently share these resources is key to maximizing performance. A crucial aspect of this parallelism is managing concurrent, shared input and output to the parallel operators. In this paper we propose and evaluate a parallel buffer that enables intra-operator parallelism on CMPs by avoiding contention between hardware threads that need to concurrently read or write to the same buffer. The parallel buffer handles parallel input and output coordination as well as load balancing so individual operators do not need to reimplement that functionality.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122620789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}