Proceedings of the 11th International Workshop on Data Management on New Hardware最新文献

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries gpu成为分析处理的主流:使用简单扫描-聚合查询的初步论证

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771941

Jason Power, Yinan Li, M. Hill, J. Patel, D. Wood

{"title":"Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries","authors":"Jason Power, Yinan Li, M. Hill, J. Patel, D. Wood","doi":"10.1145/2771937.2771941","DOIUrl":"https://doi.org/10.1145/2771937.2771941","url":null,"abstract":"There have been a number of research proposals to use discrete graphics processing units (GPUs) to accelerate database operations. Although many of these works show up to an order of magnitude performance improvement, discrete GPUs are not commonly used in modern database systems. However, there is now a proliferation of integrated GPUs which are on the same silicon die as the conventional CPU. With the advent of new programming models like heterogeneous system architecture, these integrated GPUs are considered first-class compute units, with transparent access to CPU virtual addresses and very low overhead for computation offloading. We show that integrated GPUs significantly reduce the overheads of using GPUs in a database environment. Specifically, an integrated GPU is 3x faster than a discrete GPU even though the discrete GPU has 4x the computational capability. Therefore, we develop high performance scan and aggregate algorithms for the integrated GPU. We show that the integrated GPU can outperform a four-core CPU with SIMD extensions by an average of 30% (up to 3:2x) and provides an average of 45% reduction in energy on 16 TPC-H queries.","PeriodicalId":267524,"journal":{"name":"Proceedings of the 11th International Workshop on Data Management on New Hardware","volume":"48 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113957761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Applying HTM to an OLTP System: No Free Lunch 将HTM应用于OLTP系统:没有免费的午餐

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771946

David Cervini, Danica Porobic, Pınar Tözün, A. Ailamaki

引用次数: 5

Energy-Efficient In-Memory Data Stores on Hybrid Memory Hierarchies 基于混合内存层次结构的高效内存数据存储

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771940

Ahmad Hassan, H. Vandierendonck, Dimitrios S. Nikolopoulos

引用次数: 12

NUMA obliviousness through memory mapping 通过内存映射实现NUMA遗忘

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771948

M. Gawade, M. Kersten

{"title":"NUMA obliviousness through memory mapping","authors":"M. Gawade, M. Kersten","doi":"10.1145/2771937.2771948","DOIUrl":"https://doi.org/10.1145/2771937.2771948","url":null,"abstract":"With the rise of multi-socket multi-core CPUs a lot of effort is being put into how to best exploit their abundant CPU power. In a shared memory setting the multi-socket CPUs are equipped with their own memory module, and access memory modules across sockets in a non-uniform access pattern (NUMA). Memory access across socket is relatively expensive compared to memory access within a socket. One of the common solutions to minimize across socket memory access is to partition the data, such that the data affinity is maintained per socket. In this paper we explore the role of memory mapped storage to provide transparent data access in a NUMA environment, without the need of explicit data partitioning. We compare the performance of a database engine in a distributed setting in a multi-socket environment, with a database engine in a NUMA oblivious setting. We show that though the operating system tries to keep the data affinity to local sockets, a significant remote memory access still occurs, as the number of threads increase. Hence, setting explicit process and memory affinity results into a robust execution in NUMA oblivious plans. We use micro-experiments and SQL queries from the TPC-H benchmark to provide an in-depth experimental exploration of the landscape, in a four socket Intel machine.","PeriodicalId":267524,"journal":{"name":"Proceedings of the 11th International Workshop on Data Management on New Hardware","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128144045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

The Serial Safety Net: Efficient Concurrency Control on Modern Hardware 串行安全网:现代硬件上的高效并发控制

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771949

Tianzheng Wang, Ryan Johnson, A. Fekete, I. Pandis

引用次数: 15

TLB misses: The Missing Issue of Adaptive Radix Tree? TLB缺失:自适应基树缺失的问题?

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771942

Petrie Wong, Ziqiang Feng, Wenjian Xu, Eric Lo, B. Kao

引用次数: 5

Energy-Efficient Query Processing on Embedded CPU-GPU Architectures 嵌入式CPU-GPU架构的节能查询处理

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771939

Xuntao Cheng, Bingsheng He, C. Lau

{"title":"Energy-Efficient Query Processing on Embedded CPU-GPU Architectures","authors":"Xuntao Cheng, Bingsheng He, C. Lau","doi":"10.1145/2771937.2771939","DOIUrl":"https://doi.org/10.1145/2771937.2771939","url":null,"abstract":"Energy efficiency is a major design and optimization factor for query co-processing of databases in embedded devices. Recently, GPUs of new-generation embedded devices have evolved with the programmability and computational capability for general-purpose applications. Such CPU-GPU architectures offer us opportunities to revisit GPU query co-processing in embedded environments for energy efficiency. In this paper, we experimentally evaluate and analyze the performance and energy consumption of a GPU query co-processor on such hybrid embedded architectures. Specifically, we study four major database operators as micro-benchmarks and evaluate TPC-H queries on CARMA, which has a quad-core ARM Cortex-A9 CPU and a NVIDIA Quadro 1000M GPU. We observe that the CPU delivers both better performance and lower energy consumption than the GPU for simple operators such as selection and aggregation. However, the GPU outperforms the CPU for sort and hash join in terms of both performance and energy consumption. We further show that CPU-GPU query co-processing can be an effective means of energy-efficient query co-processing in embedded systems with proper tuning and optimizations.","PeriodicalId":267524,"journal":{"name":"Proceedings of the 11th International Workshop on Data Management on New Hardware","volume":"304 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122467118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771938

A. Bremler-Barr, Yotam Harchol, David Hay, Y. Hel-Or

{"title":"Ultra-Fast Similarity Search Using Ternary Content Addressable Memory","authors":"A. Bremler-Barr, Yotam Harchol, David Hay, Y. Hel-Or","doi":"10.1145/2771937.2771938","DOIUrl":"https://doi.org/10.1145/2771937.2771938","url":null,"abstract":"Similarity search, and specifically the nearest-neighbor search (NN) problem is widely used in many fields of computer science such as machine learning, computer vision and databases. However, in many settings such searches are known to suffer from the notorious curse of dimensionality, where running time grows exponentially with d. This causes severe performance degradation when working in high-dimensional spaces. Approximate techniques such as locality-sensitive hashing [2] improve the performance of the search, but are still computationally intensive. In this paper we propose a new way to solve this problem using a special hardware device called ternary content addressable memory (TCAM). TCAM is an associative memory, which is a special type of computer memory that is widely used in switches and routers for very high speed search applications. We show that the TCAM computational model can be leveraged and adjusted to solve NN search problems in a single TCAM lookup cycle, and with linear space. This concept does not suffer from the curse of dimensionality and is shown to improve the best known approaches for NN by more than four orders of magnitude. Simulation results demonstrate dramatic improvement over the best known approaches for NN, and suggest that TCAM devices may play a critical role in future large-scale databases and cloud applications.","PeriodicalId":267524,"journal":{"name":"Proceedings of the 11th International Workshop on Data Management on New Hardware","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132331607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Scaling the Memory Power Wall With DRAM-Aware Data Management 通过内存感知数据管理扩展内存电源墙

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771947

Raja Appuswamy, Matthaios Olma, A. Ailamaki

{"title":"Scaling the Memory Power Wall With DRAM-Aware Data Management","authors":"Raja Appuswamy, Matthaios Olma, A. Ailamaki","doi":"10.1145/2771937.2771947","DOIUrl":"https://doi.org/10.1145/2771937.2771947","url":null,"abstract":"Improving the energy efficiency of database systems has emerged as an important topic of research over the past few years. While significant attention has been paid to optimizing the power consumption of tradition disk-based databases, little attention has been paid to the growing cost of DRAM power consumption in main-memory databases (MMDB). In this paper, we bridge this divide by examining power--performance tradeoffs involved in designing MMDBs. In doing so, we first show how DRAM will soon emerge as the dominating source of power consumption in emerging MMDB servers unlike traditional database servers, where CPU power consumption overshadows that of DRAM. Second, we show that using DRAM frequency scaling and power-down modes can provide substantial improvement in performance/Watt under both transactional and analytical workloads. This, again contradicts rules of thumb established for traditional servers, where the most energy-efficient configuration is often the one with highest performance. Based on our observations, we argue that the long-overlooked task of optimizing DRAM power consumption should henceforth be considered a first-class citizen in designing MMDBs. In doing so, we highlight several promising research directions and identify key design challenges that must be overcome towards achieving this goal.","PeriodicalId":267524,"journal":{"name":"Proceedings of the 11th International Workshop on Data Management on New Hardware","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122207397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Beyond the Wall: Near-Data Processing for Databases 墙外:数据库的近数据处理

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI: 10.1145/2771937.2771945

S. Xi, Oreoluwatomiwa O. Babarinsa, Manos Athanassoulis, Stratos Idreos

{"title":"Beyond the Wall: Near-Data Processing for Databases","authors":"S. Xi, Oreoluwatomiwa O. Babarinsa, Manos Athanassoulis, Stratos Idreos","doi":"10.1145/2771937.2771945","DOIUrl":"https://doi.org/10.1145/2771937.2771945","url":null,"abstract":"The continuous growth of main memory size allows modern data systems to process entire large scale datasets in memory. The increase in memory capacity, however, is not matched by proportional decrease in memory latency, causing a mismatch for in-memory processing. As a result, data movement through the memory hierarchy is now one of the main performance bottlenecks for main memory data systems. Database systems researchers have proposed several innovative solutions to minimize data movement and to make data access patterns hardware-aware. Nevertheless, all relevant rows and columns for a given query have to be moved through the memory hierarchy; hence, movement of large data sets is on the critical path. In this paper, we present JAFAR, a Near-Data Processing (NDP) accelerator for pushing selects down to memory in modern column-stores. JAFAR implements the select operator and allows only qualifying data to travel up the memory hierarchy. Through a detailed simulation of JAFAR hardware we show that it has the potential to provide 9x improvement for selects in column-stores. In addition, we discuss both hardware and software challenges for using NDP in database systems as well as opportunities for further NDP accelerators to boost additional relational operators.","PeriodicalId":267524,"journal":{"name":"Proceedings of the 11th International Workshop on Data Management on New Hardware","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123752267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 85