Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2017-05-01 DOI:10.1109/IPDPS.2017.81

Bingchao Li, Ji-zhou Sun, M. Annavaram, N. Kim

{"title":"Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management","authors":"Bingchao Li, Ji-zhou Sun, M. Annavaram, N. Kim","doi":"10.1109/IPDPS.2017.81","DOIUrl":null,"url":null,"abstract":"GPUs provide high-bandwidth/low-latency on-chip shared memory and L1 cache to efficiently service a large number of concurrent memory requests (to contiguous memory space). To support warp-wide accesses to L1 cache, GPU L1 cache lines are very wide. However, such L1 cache architecture cannot always be efficiently utilized when applications generate many memory requests with irregular access patterns especially due to branch and memory divergences. In this paper, we propose Elastic-Cache that can efficiently support both fine- and coarse-grained L1 cache-line management for applications with both regular and irregular memory access patterns. Specifically, it can store 32- or 64-byte words in non-contiguous memory space to a single 128-byte cache line. Furthermore, it neither requires an extra tag storage structure nor reduces the capacity of L1 cache since it stores auxiliary tags for fine-grained L1 cache-line managements in sharedmemory space that is not fully used in many applications. Our experiment shows that Elastic-Cache improves the geo-mean performance of applications with irregular memory access patterns by 58% without degrading performance of applications with regular memory access patterns.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"2022 18","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

GPUs provide high-bandwidth/low-latency on-chip shared memory and L1 cache to efficiently service a large number of concurrent memory requests (to contiguous memory space). To support warp-wide accesses to L1 cache, GPU L1 cache lines are very wide. However, such L1 cache architecture cannot always be efficiently utilized when applications generate many memory requests with irregular access patterns especially due to branch and memory divergences. In this paper, we propose Elastic-Cache that can efficiently support both fine- and coarse-grained L1 cache-line management for applications with both regular and irregular memory access patterns. Specifically, it can store 32- or 64-byte words in non-contiguous memory space to a single 128-byte cache line. Furthermore, it neither requires an extra tag storage structure nor reduces the capacity of L1 cache since it stores auxiliary tags for fine-grained L1 cache-line managements in sharedmemory space that is not fully used in many applications. Our experiment shows that Elastic-Cache improves the geo-mean performance of applications with irregular memory access patterns by 58% without degrading performance of applications with regular memory access patterns.

查看原文本刊更多论文

弹性缓存:高效的细粒度和粗粒度缓存线管理的GPU缓存架构

gpu提供高带宽/低延迟的片上共享内存和L1缓存，以有效地服务大量并发内存请求(对连续内存空间)。为了支持对L1缓存的warp-wide访问，GPU L1缓存线非常宽。但是，当应用程序生成许多具有不规则访问模式的内存请求时，特别是由于分支和内存分歧，这种L1缓存体系结构并不总是能够有效地利用。在本文中，我们提出了弹性缓存，它可以有效地支持细粒度和粗粒度L1缓存线管理，用于具有规则和不规则内存访问模式的应用程序。具体来说，它可以将非连续内存空间中的32字节或64字节的字存储到单个128字节的高速缓存线上。此外，它既不需要额外的标记存储结构，也不减少L1缓存的容量，因为它将用于细粒度L1缓存线管理的辅助标记存储在共享内存空间中，而共享内存空间在许多应用程序中并未得到充分利用。我们的实验表明，弹性缓存将具有不规则内存访问模式的应用程序的地理平均性能提高了58%，而不会降低具有常规内存访问模式的应用程序的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量