{"title":"Reducing leakage power by accounting for temperature inversion dependence in dual-Vt synthesized circuits","authors":"A. Calimera, R. I. Bahar, E. Macii, M. Poncino","doi":"10.1145/1393921.1393978","DOIUrl":"https://doi.org/10.1145/1393921.1393978","url":null,"abstract":"The effects of temperature on delay depend on several parameters, such as cell size, load, supply voltage, and threshold voltage. In particular, variations in Vth can yield a temperature inversion effect causing a decreases of cell delay as temperature increases. This phenomenon, besides affecting timing analysis of a design, has important and unforeseeable consequences on power optimization techniques. In this paper, we focus on the impact of such effects on multi-Vt design; in particular, we show how traditional dual-Vt optimization may yield timing errors in circuits by ignoring temperature effects. Moreover, we present a temperature-aware dual-Vt optimization technique that reduces leakage power and can guarantee that the circuit is timing feasible at the boundary temperatures provided by the technology library. Our experiments show an average 27% leakage reduction with respect to a non temperature-aware design flow.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125136896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Kanj, R. Joshi, Zhuo Li, J. B. Kuang, H. Ngo, Nancy Y. Zhou, Weiping Shi, S. Nassif
{"title":"SRAM methodology for yield and power efficiency: per-element selectable supplies and memory reconfiguration schemes","authors":"R. Kanj, R. Joshi, Zhuo Li, J. B. Kuang, H. Ngo, Nancy Y. Zhou, Weiping Shi, S. Nassif","doi":"10.1145/1393921.1393946","DOIUrl":"https://doi.org/10.1145/1393921.1393946","url":null,"abstract":"We present a novel power-aware yield enhancement design methodology and reconfiguration scheme for deep submicron SRAM designs. We show that with the continued trend of raising array supply to counter process variations, it is more effective to use a per-element selectable virtual power-supply scenario as opposed to single array supply with traditional redundancy schemes. The element can be a bank, a sub-array, or an independent row/column, and the element's virtual supply value is determined based on fail bitmaps. The technique can also be used in conjunction with traditional redundancy schemes to further improve the efficiency. The supply and redundancy assignments can be obtained by relying on memory reconfiguration algorithms. For this, we propose a greedy yet accurate algorithm that runs in O(nlogn) as opposed to average case O(n2) traditional algorithms. The methodology leads to significant power savings ranging from 20% to 50% for 65 nm technology. We expect the savings to increase in future technologies as leakage powers dominate. To the best of our knowledge, this is the first time such a methodology is applied to SRAM designs.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127620177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"System implications of integrated photonics","authors":"N. Jouppi","doi":"10.1145/1393921.1393923","DOIUrl":"https://doi.org/10.1145/1393921.1393923","url":null,"abstract":"Micron-scale photonic devices integrated with standard CMOS processes have the potential to dramatically increase system bandwidths, performance, and configuration flexibility while reducing system power. I first describe some recent developments in silicon nanophotonic technology, such as microring resonators. Small devices have many advantages: reduced power, increased density, and increased speed. By integrating many thousands of these devices on a chip, photonics could potentially be used for most high-speed off-chip and global on-chip communication. Integrated photonics has many advantages at the board and rack scale as well. Recent high-speed board-level electrical signaling (>2.5GHz) precludes the use of multi-drop busses or communication over long distances on ordinary inexpensive PC board materials. By using photonics, high fan-out and high-fan-in bus structures can be built. Due to the low loss of optical signals versus distance, these structures can even be distributed over rack-scale distances. This dramatically increases system flexibility while reducing interconnect power. As an example of the potential impact of photonics, I describe a system architecture for the 2017 time frame we call Corona. Corona is a 3D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We believe that in comparison with an electrically-connected many-core alternative, Corona can provide 2 to 6 times more performance on many memory intensive workloads, while simultaneously significantly reducing power.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133417866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Sathanur, L. Benini, A. Macii, E. Macii, M. Poncino
{"title":"Multiple power-gating domain (multi-VGND) architecture for improved leakage power reduction","authors":"A. Sathanur, L. Benini, A. Macii, E. Macii, M. Poncino","doi":"10.1145/1393921.1393938","DOIUrl":"https://doi.org/10.1145/1393921.1393938","url":null,"abstract":"Row-based power-gating has recently emerged as a meet-in-the-middle sleep transistor insertion paradigm between cell-level and block-level granularity, in which each layout row defines the unit of gating, and different rows can be clustered and share the same sleep transistor. Previous works, however, assume the availability of a single virtual ground voltage, thus making the decision of whether to gate or not a given cluster a binary choice: a cluster is either gated or not. In this work, we consider a limited set of virtual ground voltages, which allows us to assign to a cluster the virtual ground voltage that offers the best leakage-performance tradeoff for that cluster. We propose two algorithms for solving two power-gating variants: one in which the entire design is gated (given an allowable delay degradation), and another one in which only a subset of the rows is gated (given an allowable delay degradation and sleep transistor area). Our algorithm automatically finds the set of clusters with optimal virtual ground voltages so as to minimize leakage while respecting timing and area constraints. The number of power-gating domains can be user-bounded, in accordance with power grid or library characterization limitations. Results show that multiple virtual ground allow to improve by more than 34% over existing solutions that gate the entire design, and provide sizable savings also for the case of partial power-gating.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115018044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
José González, Qiong Cai, P. Chaparro, G. Magklis, R. Rakvic, Antonio González
{"title":"Thread fusion","authors":"José González, Qiong Cai, P. Chaparro, G. Magklis, R. Rakvic, Antonio González","doi":"10.1145/1393921.1394018","DOIUrl":"https://doi.org/10.1145/1393921.1394018","url":null,"abstract":"This work proposes Thread Fusion as an effective way of reducing power consumption when a Simultaneous Multi-Threaded (SMT) core is executing two threads from a homogeneous parallel application. Two dynamic instances of the same static instruction, each from a different thread are merged (fused) into a single instruction, consuming half of the resources of front-end pipeline stages. When the fused instruction is executed, it is cloned and it proceeds at full bandwidth. Our simulation results show average energy reduction of 10% with less than 1% impact on performance.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124656134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A physical level study and optimization of CAM-based checkpointed register alias table","authors":"Elham Safi, Andreas Moshovos, A. Veneris","doi":"10.1145/1393921.1393982","DOIUrl":"https://doi.org/10.1145/1393921.1393982","url":null,"abstract":"Using full-custom layouts in 130 nm technology, this work studies how the latency and energy of a checkpointed, CAM-based Register Alias Table (cRAT) vary as a function of the window size, the issue width, and the number of embedded global checkpoints (GCs). These results are compared to those of the SRAM-based RAT (sRAT). Understanding these variations is useful during the early stages of architectural exploration where physical level information is not yet available. It is found that compared to sRAT, cRAT is more sensitive to the number of physical registers and issue width, however, it is less sensitive to the number of GCs. In addition, beyond a certain number of GCs, cRAT becomes faster than its equivalent sRAT. For instance, this is true when a RAT for 64 architectural and 128 physical registers has at least 20 GCs. This work also proposes an energy optimization for the cRAT; this optimization selectively disables cRAT entries that do not result in a match during lookup. The energy savings are, for the most part, a function of the number of physical registers. For instance, for a cRAT with 128 entries energy is reduced by 40%.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125918364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low power high bandwidth amplifier with RC Miller and gain enhanced feedforward compensation","authors":"Shagun Bajoria, V. Singh, Raju Kunde, C. Parikh","doi":"10.1145/1393921.1393972","DOIUrl":"https://doi.org/10.1145/1393921.1393972","url":null,"abstract":"An improved frequency compensation technique is presented for low-power low-voltage three-stage operational amplifiers with high capacitive loads. The technique uses single RC Miller compensation and a direct gain enhanced feedforward path from the input to the output. With a load capacitance of 300 pF, the amplifier nominally achieves a dc gain of 74 dB, a 3-dB bandwidth of 2.9 kHz, a 52 degrees phase margin, and a slew rate of 0.22 V/μs, while consuming 0.24 mW of power with a 1.2-V supply voltage, in a 180 nm CMOS technology. The 3-dB bandwidth is one of the highest reported for a high-gain three-stage CMOS amplifier.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126517611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Entry control in network-on-chip for memory power reduction","authors":"Dongwook Lee, S. Yoo, Kiyoung Choi","doi":"10.1145/1393921.1393967","DOIUrl":"https://doi.org/10.1145/1393921.1393967","url":null,"abstract":"As high-end mobile embedded systems become data-intensive, the off-chip memory is becoming a major contributor to the total energy consumption. Especially, high-end mobile chips accommodate dedicated hardware blocks, e.g., codec and 3D graphics IP's, required for both performance and power consumption reasons. Those IP's usually do not have a large shared memory on chip. Thus, they communicate with each other via the off-chip DDR memory increasing off-chip memory accesses, which increases memory energy consumption during read/write operations. In this paper, we present a method of reducing memory energy consumption during read/write operations. It aims at minimizing the number of row opens and closes, which are the major source of energy consumption during read/write operations. The basic idea is to apply network entry control to prioritize consecutive open row memory accesses. The experimental results show up to 35% reduction in memory energy consumption with an industrial strength multimedia mobile SoC.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132593630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Row/column redundancy to reduce SRAM leakage in presence of random within-die delay variation","authors":"M. Goudarzi, T. Ishihara","doi":"10.1145/1393921.1393947","DOIUrl":"https://doi.org/10.1145/1393921.1393947","url":null,"abstract":"Traditionally, spare rows/columns have been used in two ways: either to replace too leaky cells to reduce leakage, or to substitute faulty cells to improve yield. In contrast, we first choose a higher threshold voltage (Vth) and/or gate-oxide thickness (Tox) for SRAM transistors at design time to reduce leakage, and then substitute the resulting too slow cells by spare rows/columns. We show that due to within-die delay variation of SRAM cells only a few cells violate target timing at higher Vth or Tox; we carefully choose the Vth and Tox values such that the original memory timing-yield remains intact for a negligible extra delay. On a commercial 90 nm process assuming 3% variation in SRAM cell delay, we obtained 47% leakage reduction by adding only 5 redundant columns at negligible area, dynamic power and delay costs.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132976236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Younghyun Kim, Youngjin Cho, N. Chang, C. Chakrabarti, N. Cho
{"title":"Extending the lifetime of media recorders constrained by battery and flash memory size","authors":"Younghyun Kim, Youngjin Cho, N. Chang, C. Chakrabarti, N. Cho","doi":"10.1145/1393921.1393964","DOIUrl":"https://doi.org/10.1145/1393921.1393964","url":null,"abstract":"The lifetime of a stand-alone media recorder is a function of both the battery size and flash memory size. In this paper, we present a power management framework for media recorders that significantly enhances their lifetime while minimizing the flash memory usage and maintaining the same level of recording quality. This is achieved by implementing a mixture of encoding algorithms of different complexities that generate data with different compression ratios, and in turn balancing the energy consumption and the flash memory usage. The proposed method can be effectively employed on a direct battery drive system which does not use a DC-DC converter. The gradual drop of the battery voltage of such system is compensated by operating algorithms of lower complexity more and more. For a speech encoding application where a mixture of ADPCM (low complexity) and MP3 (high complexity) is used, the proposed algorithm achieves 70% more lifetime than a DC-DC converter with a highest clock frequency, and 20% more lifetime than even a DC-DC converter with the optimal clock frequency.","PeriodicalId":166672,"journal":{"name":"Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133924458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}