Energy-Efficient Cache Coherence Protocols in Chip-Multiprocessors for Server Consolidation

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI:10.1109/ICPP.2011.44

Antonio García-Guirado, Ricardo Fernández Pascual, Alberto Ros, José M. García

{"title":"Energy-Efficient Cache Coherence Protocols in Chip-Multiprocessors for Server Consolidation","authors":"Antonio García-Guirado, Ricardo Fernández Pascual, Alberto Ros, José M. García","doi":"10.1109/ICPP.2011.44","DOIUrl":null,"url":null,"abstract":"As the number of cores in a chip increases, power consumption is becoming a major constraint in the design of chip multiprocessors. At the same time, server consolidation is gaining importance to take advantage of such a number of cores. Our goal is to alleviate this constraint by reducing the power consumption of chip multiprocessors used for consolidated workloads by means of the cache coherence protocol. For this, we statically divide the chip in areas, which allows us to reduce the directory overhead needed to support coherence and to reduce the network traffic. This translates into less power consumption without performance degradation. Cache coherence is maintained per area and pointers are used to link the areas, thereby achieving isolation among virtual machines and savings in memory requirements. Additionally, the coherence protocol dynamically selects one node per area as responsible for providing the data on a cache miss, thus lessening the average cache miss latency and the traffic among areas. Compared to a highly-optimized directory implementation, the leakage power consumption is reduced by 54% and the dynamic power consumption of the caches and the network-on-chip by up to 38% for a 64-tile chip multiprocessor with 4 virtual machines, showing no performance degradation.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"184 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2011.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

As the number of cores in a chip increases, power consumption is becoming a major constraint in the design of chip multiprocessors. At the same time, server consolidation is gaining importance to take advantage of such a number of cores. Our goal is to alleviate this constraint by reducing the power consumption of chip multiprocessors used for consolidated workloads by means of the cache coherence protocol. For this, we statically divide the chip in areas, which allows us to reduce the directory overhead needed to support coherence and to reduce the network traffic. This translates into less power consumption without performance degradation. Cache coherence is maintained per area and pointers are used to link the areas, thereby achieving isolation among virtual machines and savings in memory requirements. Additionally, the coherence protocol dynamically selects one node per area as responsible for providing the data on a cache miss, thus lessening the average cache miss latency and the traffic among areas. Compared to a highly-optimized directory implementation, the leakage power consumption is reduced by 54% and the dynamic power consumption of the caches and the network-on-chip by up to 38% for a 64-tile chip multiprocessor with 4 virtual machines, showing no performance degradation.

查看原文本刊更多论文

用于服务器整合的芯片多处理器节能缓存一致性协议

随着芯片中核心数量的增加，功耗正成为芯片多处理器设计的主要制约因素。与此同时，为了利用如此多的核心，服务器整合变得越来越重要。我们的目标是通过使用缓存一致性协议来减少用于合并工作负载的芯片多处理器的功耗，从而减轻这种限制。为此，我们静态地将芯片划分为多个区域，这使我们能够减少支持一致性所需的目录开销，并减少网络流量。这意味着更少的功耗而不会降低性能。每个区域保持缓存一致性，并使用指针连接这些区域，从而实现虚拟机之间的隔离并节省内存需求。此外，一致性协议在每个区域动态地选择一个节点负责提供缓存丢失上的数据，从而减少了平均缓存丢失延迟和区域之间的流量。与高度优化的目录实现相比，对于具有4个虚拟机的64块芯片多处理器，泄漏功耗降低了54%，缓存和片上网络的动态功耗降低了38%，没有表现出性能下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Parallel Processing

自引率

0.00%

发文量