Ameliorating memory contention of OLAP operators on GPU processors

International Workshop on Data Management on New Hardware Pub Date : 2012-05-21 DOI:10.1145/2236584.2236590

Evangelia A. Sitaridi, K. A. Ross

{"title":"Ameliorating memory contention of OLAP operators on GPU processors","authors":"Evangelia A. Sitaridi, K. A. Ross","doi":"10.1145/2236584.2236590","DOIUrl":null,"url":null,"abstract":"Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to contention caused by popular values, have to deal with a new performance limiting factor: thread serialization when accessing values belonging to the same bank.\n Here, we define the problem of bank and value conflict optimization for data processing operators using the CUDA platform. To analyze the impact of these two factors on operator performance we use two database operations: foreignkey join and grouped aggregation. We suggest and evaluate techniques for optimizing the data arrangement offline by creating clones of values to reduce overall memory contention. Results indicate that columns used for writes, as grouping columns, need be optimized to fully exploit the maximum bandwidth of shared memory.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2236584.2236590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

Abstract

Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to contention caused by popular values, have to deal with a new performance limiting factor: thread serialization when accessing values belonging to the same bank. Here, we define the problem of bank and value conflict optimization for data processing operators using the CUDA platform. To analyze the impact of these two factors on operator performance we use two database operations: foreignkey join and grouped aggregation. We suggest and evaluate techniques for optimizing the data arrangement offline by creating clones of values to reduce overall memory contention. Results indicate that columns used for writes, as grouping columns, need be optimized to fully exploit the maximum bandwidth of shared memory.

查看原文本刊更多论文

改进GPU处理器上OLAP操作符的内存争用

与多核cpu实现相比，在GPU处理器上实现数据库运算符显示出显著的性能改进。GPU线程可以使用共享内存进行合作，共享内存被组织在交错的内存库中，只有当线程读取和修改属于不同内存库的地址时才会快速。因此，在GPU上实现的数据处理操作符，除了流行值引起的争用之外，还必须处理一个新的性能限制因素:访问属于同一银行的值时的线程序列化。在这里，我们定义了使用CUDA平台的数据处理算子的bank和value冲突优化问题。为了分析这两个因素对操作符性能的影响，我们使用两个数据库操作:外键连接和分组聚合。我们建议并评估通过创建值的克隆来离线优化数据安排的技术，以减少总体内存争用。结果表明，需要对用于写的列(如分组列)进行优化，以充分利用共享内存的最大带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Workshop on Data Management on New Hardware

自引率

0.00%

发文量