TAP:针对CPU-GPU异构架构的tlp感知缓存管理策略

IEEE International Symposium on High-Performance Comp Architecture Pub Date : 2012-02-25 DOI:10.1109/HPCA.2012.6168947

Jaekyu Lee, Hyesoon Kim

{"title":"TAP:针对CPU-GPU异构架构的tlp感知缓存管理策略","authors":"Jaekyu Lee, Hyesoon Kim","doi":"10.1109/HPCA.2012.6168947","DOIUrl":null,"url":null,"abstract":"Combining CPUs and GPUs on the same chip has become a popular architectural trend. However, these heterogeneous architectures put more pressure on shared resource management. In particular, managing the last-level cache (LLC) is very critical to performance. Lately, many researchers have proposed several shared cache management mechanisms, including dynamic cache partitioning and promotion-based cache management, but no cache management work has been done on CPU-GPU heterogeneous architectures. Sharing the LLC between CPUs and GPUs brings new challenges due to the different characteristics of CPU and GPGPU applications. Unlike most memory-intensive CPU benchmarks that hide memory latency with caching, many GPGPU applications hide memory latency by combining thread-level parallelism (TLP) and caching. In this paper, we propose a TLP-aware cache management policy for CPU-GPU heterogeneous architectures. We introduce a core-sampling mechanism to detect how caching affects the performance of a GPGPU application. Inspired by previous cache management schemes, Utility-based Cache Partitioning (UCP) and Re-Reference Interval Prediction (RRIP), we propose two new mechanisms: TAP-UCP and TAP-RRIP. TAP-UCP improves performance by 5% over UCP and 11% over LRU on 152 heterogeneous workloads, and TAP-RRIP improves performance by 9% over RRIP and 12% over LRU.","PeriodicalId":380383,"journal":{"name":"IEEE International Symposium on High-Performance Comp Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"121","resultStr":"{\"title\":\"TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture\",\"authors\":\"Jaekyu Lee, Hyesoon Kim\",\"doi\":\"10.1109/HPCA.2012.6168947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Combining CPUs and GPUs on the same chip has become a popular architectural trend. However, these heterogeneous architectures put more pressure on shared resource management. In particular, managing the last-level cache (LLC) is very critical to performance. Lately, many researchers have proposed several shared cache management mechanisms, including dynamic cache partitioning and promotion-based cache management, but no cache management work has been done on CPU-GPU heterogeneous architectures. Sharing the LLC between CPUs and GPUs brings new challenges due to the different characteristics of CPU and GPGPU applications. Unlike most memory-intensive CPU benchmarks that hide memory latency with caching, many GPGPU applications hide memory latency by combining thread-level parallelism (TLP) and caching. In this paper, we propose a TLP-aware cache management policy for CPU-GPU heterogeneous architectures. We introduce a core-sampling mechanism to detect how caching affects the performance of a GPGPU application. Inspired by previous cache management schemes, Utility-based Cache Partitioning (UCP) and Re-Reference Interval Prediction (RRIP), we propose two new mechanisms: TAP-UCP and TAP-RRIP. TAP-UCP improves performance by 5% over UCP and 11% over LRU on 152 heterogeneous workloads, and TAP-RRIP improves performance by 9% over RRIP and 12% over LRU.\",\"PeriodicalId\":380383,\"journal\":{\"name\":\"IEEE International Symposium on High-Performance Comp Architecture\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"121\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on High-Performance Comp Architecture\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2012.6168947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Comp Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2012.6168947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 121

摘要

将cpu和gpu结合在同一个芯片上已经成为一种流行的架构趋势。然而，这些异构体系结构给共享资源管理带来了更大的压力。特别是，管理最后一级缓存(LLC)对性能非常关键。近年来，许多研究人员提出了几种共享缓存管理机制，包括动态缓存分区和基于提升的缓存管理，但尚未对CPU-GPU异构架构的缓存管理工作进行研究。由于CPU和GPGPU应用的特性不同，CPU和gpu之间的LLC共享带来了新的挑战。与大多数使用缓存隐藏内存延迟的内存密集型CPU基准测试不同，许多GPGPU应用程序通过结合线程级并行性(TLP)和缓存来隐藏内存延迟。在本文中，我们提出了一种CPU-GPU异构架构的tlp感知缓存管理策略。我们引入了一个核心采样机制来检测缓存如何影响GPGPU应用程序的性能。在基于效用的缓存分区(Utility-based cache Partitioning, UCP)和重引用间隔预测(Re-Reference Interval Prediction, RRIP)机制的启发下，我们提出了两种新的缓存管理机制:TAP-UCP和TAP-RRIP。在152个异构工作负载上，TAP-UCP的性能比UCP提高5%，比LRU提高11%，TAP-RRIP的性能比RRIP提高9%，比LRU提高12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture

Combining CPUs and GPUs on the same chip has become a popular architectural trend. However, these heterogeneous architectures put more pressure on shared resource management. In particular, managing the last-level cache (LLC) is very critical to performance. Lately, many researchers have proposed several shared cache management mechanisms, including dynamic cache partitioning and promotion-based cache management, but no cache management work has been done on CPU-GPU heterogeneous architectures. Sharing the LLC between CPUs and GPUs brings new challenges due to the different characteristics of CPU and GPGPU applications. Unlike most memory-intensive CPU benchmarks that hide memory latency with caching, many GPGPU applications hide memory latency by combining thread-level parallelism (TLP) and caching. In this paper, we propose a TLP-aware cache management policy for CPU-GPU heterogeneous architectures. We introduce a core-sampling mechanism to detect how caching affects the performance of a GPGPU application. Inspired by previous cache management schemes, Utility-based Cache Partitioning (UCP) and Re-Reference Interval Prediction (RRIP), we propose two new mechanisms: TAP-UCP and TAP-RRIP. TAP-UCP improves performance by 5% over UCP and 11% over LRU on 152 heterogeneous workloads, and TAP-RRIP improves performance by 9% over RRIP and 12% over LRU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Symposium on High-Performance Comp Architecture

自引率

0.00%

发文量