TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture

Jaekyu Lee, Hyesoon Kim
{"title":"TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture","authors":"Jaekyu Lee, Hyesoon Kim","doi":"10.1109/HPCA.2012.6168947","DOIUrl":null,"url":null,"abstract":"Combining CPUs and GPUs on the same chip has become a popular architectural trend. However, these heterogeneous architectures put more pressure on shared resource management. In particular, managing the last-level cache (LLC) is very critical to performance. Lately, many researchers have proposed several shared cache management mechanisms, including dynamic cache partitioning and promotion-based cache management, but no cache management work has been done on CPU-GPU heterogeneous architectures. Sharing the LLC between CPUs and GPUs brings new challenges due to the different characteristics of CPU and GPGPU applications. Unlike most memory-intensive CPU benchmarks that hide memory latency with caching, many GPGPU applications hide memory latency by combining thread-level parallelism (TLP) and caching. In this paper, we propose a TLP-aware cache management policy for CPU-GPU heterogeneous architectures. We introduce a core-sampling mechanism to detect how caching affects the performance of a GPGPU application. Inspired by previous cache management schemes, Utility-based Cache Partitioning (UCP) and Re-Reference Interval Prediction (RRIP), we propose two new mechanisms: TAP-UCP and TAP-RRIP. TAP-UCP improves performance by 5% over UCP and 11% over LRU on 152 heterogeneous workloads, and TAP-RRIP improves performance by 9% over RRIP and 12% over LRU.","PeriodicalId":380383,"journal":{"name":"IEEE International Symposium on High-Performance Comp Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"121","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Comp Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2012.6168947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 121

Abstract

Combining CPUs and GPUs on the same chip has become a popular architectural trend. However, these heterogeneous architectures put more pressure on shared resource management. In particular, managing the last-level cache (LLC) is very critical to performance. Lately, many researchers have proposed several shared cache management mechanisms, including dynamic cache partitioning and promotion-based cache management, but no cache management work has been done on CPU-GPU heterogeneous architectures. Sharing the LLC between CPUs and GPUs brings new challenges due to the different characteristics of CPU and GPGPU applications. Unlike most memory-intensive CPU benchmarks that hide memory latency with caching, many GPGPU applications hide memory latency by combining thread-level parallelism (TLP) and caching. In this paper, we propose a TLP-aware cache management policy for CPU-GPU heterogeneous architectures. We introduce a core-sampling mechanism to detect how caching affects the performance of a GPGPU application. Inspired by previous cache management schemes, Utility-based Cache Partitioning (UCP) and Re-Reference Interval Prediction (RRIP), we propose two new mechanisms: TAP-UCP and TAP-RRIP. TAP-UCP improves performance by 5% over UCP and 11% over LRU on 152 heterogeneous workloads, and TAP-RRIP improves performance by 9% over RRIP and 12% over LRU.
TAP:针对CPU-GPU异构架构的tlp感知缓存管理策略
将cpu和gpu结合在同一个芯片上已经成为一种流行的架构趋势。然而,这些异构体系结构给共享资源管理带来了更大的压力。特别是,管理最后一级缓存(LLC)对性能非常关键。近年来,许多研究人员提出了几种共享缓存管理机制,包括动态缓存分区和基于提升的缓存管理,但尚未对CPU-GPU异构架构的缓存管理工作进行研究。由于CPU和GPGPU应用的特性不同,CPU和gpu之间的LLC共享带来了新的挑战。与大多数使用缓存隐藏内存延迟的内存密集型CPU基准测试不同,许多GPGPU应用程序通过结合线程级并行性(TLP)和缓存来隐藏内存延迟。在本文中,我们提出了一种CPU-GPU异构架构的tlp感知缓存管理策略。我们引入了一个核心采样机制来检测缓存如何影响GPGPU应用程序的性能。在基于效用的缓存分区(Utility-based cache Partitioning, UCP)和重引用间隔预测(Re-Reference Interval Prediction, RRIP)机制的启发下,我们提出了两种新的缓存管理机制:TAP-UCP和TAP-RRIP。在152个异构工作负载上,TAP-UCP的性能比UCP提高5%,比LRU提高11%,TAP-RRIP的性能比RRIP提高9%,比LRU提高12%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信