Big data causing big (TLB) problems: taming random memory accesses on the GPU

Proceedings of the 13th International Workshop on Data Management on New Hardware Pub Date : 2017-05-14 DOI:10.1145/3076113.3076115

Tomas Karnagel, Tal Ben-Nun, Matthias Werner, Dirk Habich, Wolfgang Lehner

{"title":"Big data causing big (TLB) problems: taming random memory accesses on the GPU","authors":"Tomas Karnagel, Tal Ben-Nun, Matthias Werner, Dirk Habich, Wolfgang Lehner","doi":"10.1145/3076113.3076115","DOIUrl":null,"url":null,"abstract":"GPUs are increasingly adopted for large-scale database processing, where data accesses represent the major part of the computation. If the data accesses are irregular, like hash table accesses or random sampling, the GPU performance can suffer. Especially when scaling such accesses beyond 2GB of data, a performance decrease of an order of magnitude is encountered. This paper analyzes the source of the slowdown through extensive micro-benchmarking, attributing the root cause to the Translation Lookaside Buffer (TLB). Using the micro-benchmarks, the TLB hierarchy and structure are fully analyzed on two different GPU architectures, identifying never-before-published TLB sizes that can be used for efficient large-scale application tuning. Based on the gained knowledge, we propose a TLB-conscious approach to mitigate the slowdown for algorithms with irregular memory access. The proposed approach is applied to two fundamental database operations - random sampling and hash-based grouping - showing that the slowdown can be dramatically reduced, and resulting in a performance increase of up to 13×.","PeriodicalId":185720,"journal":{"name":"Proceedings of the 13th International Workshop on Data Management on New Hardware","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3076113.3076115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

GPUs are increasingly adopted for large-scale database processing, where data accesses represent the major part of the computation. If the data accesses are irregular, like hash table accesses or random sampling, the GPU performance can suffer. Especially when scaling such accesses beyond 2GB of data, a performance decrease of an order of magnitude is encountered. This paper analyzes the source of the slowdown through extensive micro-benchmarking, attributing the root cause to the Translation Lookaside Buffer (TLB). Using the micro-benchmarks, the TLB hierarchy and structure are fully analyzed on two different GPU architectures, identifying never-before-published TLB sizes that can be used for efficient large-scale application tuning. Based on the gained knowledge, we propose a TLB-conscious approach to mitigate the slowdown for algorithms with irregular memory access. The proposed approach is applied to two fundamental database operations - random sampling and hash-based grouping - showing that the slowdown can be dramatically reduced, and resulting in a performance increase of up to 13×.

查看原文本刊更多论文

大数据导致大(TLB)问题:控制GPU上的随机内存访问

gpu越来越多地用于大规模数据库处理，其中数据访问代表了计算的主要部分。如果数据访问是不规则的，如哈希表访问或随机抽样，则GPU性能可能会受到影响。特别是当将此类访问扩展到超过2GB的数据时，会遇到一个数量级的性能下降。本文通过广泛的微基准测试分析了速度放缓的根源，并将其归因于翻译Lookaside Buffer (TLB)。使用微基准测试，在两种不同的GPU架构上全面分析了TLB层次结构和结构，确定了可用于高效大规模应用程序调优的从未发布过的TLB大小。基于所获得的知识，我们提出了一种tlb意识方法来减轻具有不规则内存访问的算法的减速。所提出的方法应用于两个基本的数据库操作——随机抽样和基于哈希的分组——表明可以显著降低速度，并导致性能提高高达13倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 13th International Workshop on Data Management on New Hardware

自引率

0.00%

发文量