A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture Pub Date : 2010-04-01 DOI:10.1109/HPCA.2010.5416654

Dimitris Kaseridis, Jeffrey Stuecheli, Jing Chen, L. John

{"title":"A bandwidth-aware memory-subsystem resource management using non-invasive resource profilers for large CMP systems","authors":"Dimitris Kaseridis, Jeffrey Stuecheli, Jing Chen, L. John","doi":"10.1109/HPCA.2010.5416654","DOIUrl":null,"url":null,"abstract":"By integrating multiple cores in a single chip, Chip Multiprocessors (CMP) provide an attractive approach to improve both system throughput and efficiency. This integration allows the sharing of on-chip resources which may lead to destructive interference between the executing workloads. Memorysubsystem is an important shared resource that contributes significantly to the overall throughput and power consumption. In order to prevent destructive interference, the cache capacity and memory bandwidth requirements of the last level cache have to be controlled. While previously proposed schemes focus on resource sharing within a chip, we explore additional possibilities both inside and outside a single chip. We propose a dynamic memory-subsystem resource management scheme that considers both cache capacity and memory bandwidth contention in large multi-chip CMP systems. Our approach uses low overhead, non-invasive resource profilers that are based on Mattson's stack distance algorithm to project each core's resource requirements and guide our cache partitioning algorithms. Our bandwidth-aware algorithm seeks for throughput optimizations among multiple chips by migrating workloads from the most resource-overcommitted chips to the ones with more available resources. Use of bandwidth as a criterion results in an overall 18% reduction in memory bandwidth along with a 7.9% reduction in miss rate, compared to existing resource management schemes. Using a cycle-accurate full system simulator, our approach achieved an average improvement of 8.5% on throughput.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416654","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

Abstract

By integrating multiple cores in a single chip, Chip Multiprocessors (CMP) provide an attractive approach to improve both system throughput and efficiency. This integration allows the sharing of on-chip resources which may lead to destructive interference between the executing workloads. Memorysubsystem is an important shared resource that contributes significantly to the overall throughput and power consumption. In order to prevent destructive interference, the cache capacity and memory bandwidth requirements of the last level cache have to be controlled. While previously proposed schemes focus on resource sharing within a chip, we explore additional possibilities both inside and outside a single chip. We propose a dynamic memory-subsystem resource management scheme that considers both cache capacity and memory bandwidth contention in large multi-chip CMP systems. Our approach uses low overhead, non-invasive resource profilers that are based on Mattson's stack distance algorithm to project each core's resource requirements and guide our cache partitioning algorithms. Our bandwidth-aware algorithm seeks for throughput optimizations among multiple chips by migrating workloads from the most resource-overcommitted chips to the ones with more available resources. Use of bandwidth as a criterion results in an overall 18% reduction in memory bandwidth along with a 7.9% reduction in miss rate, compared to existing resource management schemes. Using a cycle-accurate full system simulator, our approach achieved an average improvement of 8.5% on throughput.

查看原文本刊更多论文

使用非侵入性资源分析器的大型CMP系统的带宽感知内存子系统资源管理

通过在单个芯片中集成多个核心，芯片多处理器(CMP)提供了一种有吸引力的方法来提高系统吞吐量和效率。这种集成允许共享片上资源，这可能导致执行工作负载之间的破坏性干扰。内存子系统是一个重要的共享资源，对总体吞吐量和功耗有很大贡献。为了防止破坏性干扰，必须对最后一级缓存的缓存容量和内存带宽要求进行控制。虽然以前提出的方案侧重于芯片内的资源共享，但我们探索了单个芯片内部和外部的其他可能性。在大型多芯片CMP系统中，我们提出了一种考虑缓存容量和内存带宽争用的动态内存子系统资源管理方案。我们的方法使用低开销、非侵入性的资源分析器，它基于Mattson的堆栈距离算法来预测每个核心的资源需求，并指导我们的缓存分区算法。我们的带宽感知算法通过将工作负载从资源过度使用最多的芯片迁移到具有更多可用资源的芯片来寻求多个芯片之间的吞吐量优化。与现有的资源管理方案相比，使用带宽作为标准可以使内存带宽减少18%，丢失率减少7.9%。使用周期精确的全系统模拟器，我们的方法在吞吐量上平均提高了8.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture

自引率

0.00%

发文量