A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini, A. Macii, E. Macii, M. Poncino
{"title":"Dynamic Thermal Clock Skew Compensation using Tunable Delay Buffers","authors":"A. Chakraborty, K. Duraisami, A. Sathanur, P. Sithambaram, L. Benini, A. Macii, E. Macii, M. Poncino","doi":"10.1145/1165573.1165612","DOIUrl":"https://doi.org/10.1145/1165573.1165612","url":null,"abstract":"The thermal gradients existing in high-performance circuits may significantly affect their timing behavior, in particular by increasing the skew of the clock net and/or altering hold/setup constraints, possibly causing the circuit to operate incorrectly. The knowledge of the spatial distribution of temperature can be used to properly design a clock network that is able to compensate such thermal non-uniformities. However, re-design of the clock network is effective only if temperature distribution is stationary, i.e., does not change over time. In this work, we specifically address the problem of dynamically modifying the clock tree in such a way that it can compensate for temporal variations of temperature. This is achieved by exploiting the buffers that are inserted during the clock network generation, by transforming them into tunable delay elements. Temperature-induced delay variations are then compensated by applying the proper tuning to the tunable buffers, which is computed off-line and, stored in a tuning table inserted in the design. We propose an algorithm to minimize the number of inserted tunable buffers, as well as their tunable range (which directly relates to complexity). Results show that clock skew is kept within original bounds with minimum area and power penalty. The maximum increase in power is 23.2% with most benchmarks exhibiting less than 5% increase in power","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114153677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low-Power Active Substrate-Noise Decoupling Circuit with Feedforward Compensation for Mixed-Signal SoCs","authors":"Song Guo, Hoi Lee","doi":"10.1145/1165573.1165649","DOIUrl":"https://doi.org/10.1145/1165573.1165649","url":null,"abstract":"This paper presents a low-power active-decoupling circuit using feedforward-compensation technique for SoC substrate-noise reduction. The proposed feedforward technique not only generates a left-half-plane zero for the decoupling circuit achieving stability and wider bandwidth in low-power condition, but also increases the dynamic current during transient. As a result, substrate-noise suppression has been significantly improved with larger-amplitude and higher-frequency noise sources. In a standard 0.13mum CMOS process, simulation results show that the proposed feedforward technique enhances both the bandwidth and dynamic current of the decoupling circuit by 2 and 79 times, respectively, without additional static power consumption. The decoupling circuit thus improves the crosstalk noise suppression from 3.6 to 6.6 times with a 1GHz noise-source amplitude increasing from 100mV to 500mV","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132279722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Super Cut-off Transistors for Ultralow Power Digital Logic Circuits","authors":"A. Raychowdhury, Xuanyao Fong, Qikai Chen, K. Roy","doi":"10.1145/1165573.1165577","DOIUrl":"https://doi.org/10.1145/1165573.1165577","url":null,"abstract":"Super cut-off devices with sub-60mV/decade subthreshold swings have recently been demonstrated and being extensively studied. This paper presents a feasibility analysis of such tunneling devices for ultralow power subthreshold logic. Analysis shows that this device can deliver 800times higher performance (@iso-IOFF) compared to a MOSFET. The possible use of this device as a sleep transistor in conjunction with the regular Si MOSFET shows 2000times average improvement in leakage power compared to Si MOSFETs","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126081444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Approach for Variation Aware Power Minimization during Gate Sizing","authors":"V. Mahalingam, N. Ranganathan, J. Harlow","doi":"10.1145/1165573.1165614","DOIUrl":"https://doi.org/10.1145/1165573.1165614","url":null,"abstract":"Increasing dominance of process variations in the nanometer designs is posing significant challenges for circuit design and optimization. The variations in parameters such as channel length and the gate oxide thickness impacts circuit delay and power. In this paper, we propose a new gate sizing algorithm using fuzzy mathematical programming (FMP) in which the uncertainty due to process variations is modeled using fuzzy numbers. The variations in gate delay, which is a function of gate sizes and the fan-outs of the gate, are represented using triangular fuzzy numbers with linear membership functions. The variation aware gate sizing problem is formulated as a fuzzy mathematical program to perform a delay constrained power minimization in the presence of variations. Initially, a deterministic optimization is performed by fixing the fuzzy parameters to the worst and the average case values and the results are used to convert the fuzzy optimization problem into a crisp non-linear problem which is then solved using a non-linear optimization solver. The above model with delay and power as constraints, maximizes the robustness, i.e., the variation resistance of the circuit and thus the yield. The proposed approach was tested on ISCAS'85 benchmarks and the results were validated for timing yield using Monte-Carlo simulations. The fuzzy approach yields significantly better results compared to stochastic programming based gate sizing approach with a comparable runtime","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1997 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121073678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power Phase Variation in a Commercial Server Workload","authors":"W. Bircher, L. John","doi":"10.1145/1165573.1165656","DOIUrl":"https://doi.org/10.1145/1165573.1165656","url":null,"abstract":"Many techniques have been developed for adaptive power management of computing systems. These techniques rely on the presence of varying power phases to detect opportunities for adaptation. However, little information is available regarding the extent of power phases in real systems. This paper illustrates available power phases ranging from 1 millisecond to 1 second using a commercial workload running on enterprise class hardware. Data is obtained using a server instrumented for power measurement at the subsystem level. The analysis shows that chipset, memory and disk subsystems have the most homogenous phase behavior with greater than 71% of samples within phases of 100 milliseconds or shorter. In contrast, CPU and I/O subsystems have much more variation with only 26% of samples within phases of 10 milliseconds or shorter","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"602 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123326910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing Idle Mode Power in Software Defined Radio Terminals","authors":"Hyunseok Lee, T. Mudge, C. Chakrabarti","doi":"10.1145/1165573.1165597","DOIUrl":"https://doi.org/10.1145/1165573.1165597","url":null,"abstract":"In this paper, we propose a processor which is optimized for idle mode operation of a software defined radio (SDR) terminal. Since a SDR terminal spends most of its time in the idle mode, reducing the power consumption in this mode directly translates to longer terminal standby time. Workload analysis of idle mode operations of contemporary standards showed that these are dominated by FIR filtering, which can be easily parallelized. This analysis was used in the design of the idle mode processor. The key architectural components are an SIMD unit for the parallel computations that dominate the workload, a conventional scalar unit for the sequential computations, and a control unit which supports efficient data memory access and loop control. The idle mode processor was modeled with Verilog and synthesized using standard cells in 0.13 micron technology. It consumes about 9mW at 1.08V","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117123476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-efficient Motion Estimation using Error-Tolerance","authors":"G. Varatkar, Naresh R Shanbhag","doi":"10.1145/1165573.1165599","DOIUrl":"https://doi.org/10.1145/1165573.1165599","url":null,"abstract":"Presented is an energy-efficient motion estimation architecture using error-tolerance. The technique employs overscaling of the supply voltage (voltage overscaling (VOS)) to reduce power at the expense of timing errors, which are then corrected using algorithmic noise-tolerance (ANT) techniques. Referred to as input subsampled replica ANT (ISR-ANT), the proposed technique incorporates an input subsampled replica of the main sum of absolute difference (MSAD) block for obtaining the motion vectors in the presence of errors induced by VOS. Simulations show that the proposed technique can save up to 60% power over an optimal error-free present day system in a 130nm CMOS technology. Power savings increase to 79% in a 45nm predictive process technology","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115377090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing Cache Traffic and Energy with Macro Data Load","authors":"Lei Jin, Sangyeun Cho","doi":"10.1145/1165573.1165608","DOIUrl":"https://doi.org/10.1145/1165573.1165608","url":null,"abstract":"This paper presents a study on macro data load, an efficient mechanism to enhance loaded value reuse. A macro data load brings into the processor a maximum-width data value the cache port allows, saves it in an internal structure, and facilitates reuse by later loads. A comprehensive limit study using a generalized memory value reuse table (MVRT) shows the significantly increased reuse opportunities provided by macro data load. We also describe a modified load store queue design as an implementation of the proposed concept. Our quantitative study shows that over 35% of L1 cache accesses in the SPEC2k integer and MiBench programs can be eliminated, resulting in a related energy reduction of 24% and 35% on average, respectively","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128740391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Magklis, P. Chaparro, José González, Antonio González
{"title":"Independent Front-end and Back-end Dynamic Voltage Scaling for a GALS Microarchitecture","authors":"G. Magklis, P. Chaparro, José González, Antonio González","doi":"10.1145/1165573.1165586","DOIUrl":"https://doi.org/10.1145/1165573.1165586","url":null,"abstract":"In recent years, globally asynchronous locally synchronous (GALS) designs and dynamic voltage scaling (DVS) have emerged as some of the most popular approaches to address the ever increasing microprocessor energy consumption. In this work, we propose two on-line algorithms for adjusting dynamically, and independently, the voltage and frequency of the front-end and back-end domains of a novel two-domain microprocessor. We evaluate our mechanisms for both internal and external voltage regulators, and we present optimal dynamic voltage scaling results for the proposed microarchitecture. Our schemes achieve average improvement of 12% of the energy-delay metric, when using internal voltage regulators","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116409926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synchronization-Driven Dynamic Speed Scaling for MPSoCs","authors":"M. Loghi, M. Poncino, L. Benini","doi":"10.1145/1165573.1165655","DOIUrl":"https://doi.org/10.1145/1165573.1165655","url":null,"abstract":"Equalizing the ratios between workloads and speeds of processing elements provides the optimal speed allocation. Based on that principle, this work describes a dynamic speed setting policy for multiprocessor systems-on-chip (MPSoCs) that relies on the estimation of processor idle times specifically due to the synchronization work. The policy provides two advantages: first, it does not rely on any assumption about the communication pattern of the application executed by the system. Second, it is purely architectural; it automatically detects changes in the system workload and sets processors speeds accordingly by means of a custom hardware block. Results on a parallel MPEG video decoding application show an EDP saving above 55%, averaged over several datasets, corresponding to an energy saving above 50%, and a corresponding penalty in performance below 8%","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130592315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}