{"title":"Width-Aware Fine-Grained Dynamic Supply Gating: A Design Methodology for Low-Power Datapath and Memory","authors":"L. Wang, Somnath Paul, S. Bhunia","doi":"10.1109/VLSID.2012.94","DOIUrl":null,"url":null,"abstract":"With increasing contribution of leakage in total active power, run-time leakage control techniques are becoming extremely important. Supply gating provides an effective, low-overhead and technology scalable approach for active leakage reduction through the well-known \"stacking effect\". However, conventional supply gating approaches are typically coarse-grained in both space and time - i.e. are applied to large data path or memory blocks when an entire logic/memory block is idle for sufficiently long period. They suffer from limited applicability at run time. On the other hand, fine-grained supply gating is constrained primarily by the large wake-up delay and wake-up power overhead. In this paper, we propose a novel fine-grained width-aware dynamic supply gating (WADSG) approach to reduce both active leakage and redundant switching power in data path and embedded memory (e.g. L1/L2 cache). The approach exploits the abundance of narrow-width (NW) operands in general-purpose and embedded applications to \"supply-gate\" unused parts of integer execution units and memory blocks while they are in use. We introduce a novel levelized gating strategy to virtually eliminate the wake-up delay overhead. We employ the proposed WADSG approach to a super scalar processor. To reduce the wake-up power we use a width aware instruction issue policy. In case of L1 and L2 cache, we store the width information per \"ways\" of associative cache and supply-gate the most significant bits of the NW ways. We also propose a width-aware block allocation and replacement policy to maximize the number of NW ways. Simulation results for 45nm technology with Spec2k benchmarks show major savings (34.5%) in total processor power (considering both switching and active leakage power) with no performance impact. As a by-product, the proposed scheme also improves the thermal profile of both data path and memory.","PeriodicalId":405021,"journal":{"name":"2012 25th International Conference on VLSI Design","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 25th International Conference on VLSI Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VLSID.2012.94","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
With increasing contribution of leakage in total active power, run-time leakage control techniques are becoming extremely important. Supply gating provides an effective, low-overhead and technology scalable approach for active leakage reduction through the well-known "stacking effect". However, conventional supply gating approaches are typically coarse-grained in both space and time - i.e. are applied to large data path or memory blocks when an entire logic/memory block is idle for sufficiently long period. They suffer from limited applicability at run time. On the other hand, fine-grained supply gating is constrained primarily by the large wake-up delay and wake-up power overhead. In this paper, we propose a novel fine-grained width-aware dynamic supply gating (WADSG) approach to reduce both active leakage and redundant switching power in data path and embedded memory (e.g. L1/L2 cache). The approach exploits the abundance of narrow-width (NW) operands in general-purpose and embedded applications to "supply-gate" unused parts of integer execution units and memory blocks while they are in use. We introduce a novel levelized gating strategy to virtually eliminate the wake-up delay overhead. We employ the proposed WADSG approach to a super scalar processor. To reduce the wake-up power we use a width aware instruction issue policy. In case of L1 and L2 cache, we store the width information per "ways" of associative cache and supply-gate the most significant bits of the NW ways. We also propose a width-aware block allocation and replacement policy to maximize the number of NW ways. Simulation results for 45nm technology with Spec2k benchmarks show major savings (34.5%) in total processor power (considering both switching and active leakage power) with no performance impact. As a by-product, the proposed scheme also improves the thermal profile of both data path and memory.