Scalable aggregation on multicore processors

International Workshop on Data Management on New Hardware Pub Date : 2011-06-13 DOI:10.1145/1995441.1995442

Yangang Ye, K. A. Ross, Norases Vesdapunt

{"title":"Scalable aggregation on multicore processors","authors":"Yangang Ye, K. A. Ross, Norases Vesdapunt","doi":"10.1145/1995441.1995442","DOIUrl":null,"url":null,"abstract":"In data-intensive and multi-threaded programming, the performance bottleneck has shifted from I/O bandwidth to main memory bandwidth. The availability, size, and other properties of on-chip cache strongly influence performance. A key question is whether to allow different threads to work independently, or whether to coordinate the shared workload among the threads. The independent approach avoids synchronization overhead, but requires resources proportional to the number of threads and thus is not scalable. On the other hand, the shared method suffers from coordination overhead and potential contention.\n In this paper, we aim to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture. We consider several previously proposed techniques that were evaluated on other architectures, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected. We also propose two algorithms: partition-and-aggregate and PLAT. The PLAT and hybrid methods perform best overall, utilizing the computational power of multiple threads without needing memory proportional to the number of threads, and avoiding much of the coordination overhead and contention apparent in the shared table method.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"78","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1995441.1995442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 78

Abstract

In data-intensive and multi-threaded programming, the performance bottleneck has shifted from I/O bandwidth to main memory bandwidth. The availability, size, and other properties of on-chip cache strongly influence performance. A key question is whether to allow different threads to work independently, or whether to coordinate the shared workload among the threads. The independent approach avoids synchronization overhead, but requires resources proportional to the number of threads and thus is not scalable. On the other hand, the shared method suffers from coordination overhead and potential contention. In this paper, we aim to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture. We consider several previously proposed techniques that were evaluated on other architectures, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected. We also propose two algorithms: partition-and-aggregate and PLAT. The PLAT and hybrid methods perform best overall, utilizing the computational power of multiple threads without needing memory proportional to the number of threads, and avoiding much of the coordination overhead and contention apparent in the shared table method.

查看原文本刊更多论文

多核处理器上的可伸缩聚合

在数据密集型和多线程编程中，性能瓶颈已经从I/O带宽转移到主存带宽。片上缓存的可用性、大小和其他属性对性能有很大影响。一个关键问题是是否允许不同的线程独立工作，或者是否在线程之间协调共享的工作负载。独立方法避免了同步开销，但需要与线程数量成比例的资源，因此不可扩展。另一方面，共享方法受到协调开销和潜在争用的困扰。在本文中，我们的目标是提供一个在Intel Nehalem架构上执行内存并行聚合的解决方案。我们考虑了之前在其他架构上评估过的几种技术，包括一种混合的独立/共享方法和一种在检测到争用时自动克隆数据项的方法。我们还提出了两种算法:分区聚合算法和PLAT算法。PLAT和混合方法总体上表现最好，它们利用了多线程的计算能力，而不需要与线程数量成比例的内存，并且避免了共享表方法中明显存在的大部分协调开销和争用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Workshop on Data Management on New Hardware

自引率

0.00%

发文量