{"title":"A Novel Power Optimization Technique for Ultra-Low Power RFICs","authors":"A. Shameli, P. Heydari","doi":"10.1145/1165573.1165639","DOIUrl":"https://doi.org/10.1145/1165573.1165639","url":null,"abstract":"This paper presents a novel power optimization technique for ultra-low power (ULP) RFICs. A new figure of merit, namely the g<sub>m </sub>f<sub>T</sub>- to-current ratio (g<sub>m</sub>f<sub>T</sub>/I<sub>D</sub>), is defined for a MOS transistor, which accounts for both the unity-gain frequency and current consumption. It is demonstrated both analytically and experimentally that the g<sub>m</sub>f<sub>T</sub>/I<sub>D</sub> reaches its maximum value in moderate inversion region. Next, using the proposed method, a power optimized common-gate low-noise amplifier (LNA) with active load has been designed and fabricated in a CMOS 0.18mum process operating at 950MHz. Measurement results show a noise-figure (NF) of 4.9dB and a small signal gain of 15.6dB with a record-breaking power dissipation of only 100muW","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125419060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Efficient Dynamic Instruction Scheduling Logic through Instruction Grouping","authors":"Hiroshi Sasaki, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1145/1165573.1165585","DOIUrl":"https://doi.org/10.1145/1165573.1165585","url":null,"abstract":"Dynamic instruction scheduling logic is quite complex and dissipates significant energy in microprocessors that support superscalar and out-of-order execution. We propose a novel microarchitectural technique to reduce the complexity and energy consumption of the dynamic instruction scheduling logic. The proposed method groups several instructions as a single issue unit and reduces the required number of ports and the size of the structure for dispatch, wakeup, select, and issue. The present paper describes the microarchitecture mechanisms and shows evaluation results for energy savings and performance. These results reveal that the proposed technique can greatly reduce energy with almost no performance degradation, compared to the conventional dynamic instruction scheduling logic","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132365737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power Reduction of Multiple Disks Using Dynamic Cache Resizing and Speed Control","authors":"Le Cai, Yung-Hsiang Lu","doi":"10.1145/1165573.1165617","DOIUrl":"https://doi.org/10.1145/1165573.1165617","url":null,"abstract":"This paper presents an energy-conservation method for multiple disks and their cache memory. Our method periodically resizes the cache memory and controls the rotation speeds under performance constraints. The cache memory stores the data from the disks for reuse. Enlarging the cache memory reduces disk accesses and disk utilization. This allows the disks to reduce their speeds and conserve energy because the disks' power consumption is quadratic to their speeds. However, the cache memory itself consumes power to retain data. Shrinking cache memory can save memory power while increasing disk accesses and degrading performance. Choosing proper cache sizes and rotation speeds can reduce the energy consumption of both memory and disks with satisfactory performance. We model cache resizing and speed setting as an optimization problem with minimizing the power consumption as objective and limiting disk utilization as constraints. We compare our method with the methods resizing cache based on request rates. The simulation results show that our method achieves better energy-savings while limiting disk access latency","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121274108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Register File Caching for Energy Efficiency","authors":"Hui Zeng, K. Ghose","doi":"10.1145/1165573.1165633","DOIUrl":"https://doi.org/10.1145/1165573.1165633","url":null,"abstract":"With the use of faster clocks and larger instruction windows in high-end superscalar processors, the physical register files (RFs) can no longer be accessed in a single cycle. To combat the consequential performance penalty, the RFs employ multiple levels of bypassing. Register file caching, which caches a small subset of the registers in a faster, smaller structure called the register file cache (RFC) has also been proposed as a remedy for this problem. We introduce a relatively simple RFC design that partitions the RFC into two separate components: a FIFO queue for holding register values that are used over a short duration following their writeback and another small set-associative cache holding values that are likely to be used over a longer duration. Results written to the RFC are easily classified into these categories and the classification bit is also used to predict the nature of the result for the next execution of the same instruction. We show that significant energy savings - about 38% on the average - occurs in accessing register operands when a 28-entry RFC is used, together with a 96-entry RF with no additional bypassing when compared with a base case design that has 128 registers with a 2 cycle access time and having one additional level of bypassing. The performance drop compared against the base case is also negligible (0.3% drop)","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133031217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variability-Aware Device Optimization under ION and Leakage Current Constraints","authors":"J. Jaffari, M. Anis","doi":"10.1145/1165573.1165601","DOIUrl":"https://doi.org/10.1145/1165573.1165601","url":null,"abstract":"In this paper, a novel device optimization methodology is presented that is constrained by the total leakage and the ON current of the device. The devised technique locates a maximum yield rectangular cube in a three-dimensional feasible space composed by oxide thickness, halo peak doping, and halo characteristic length parameters. The center of this cube is considered as the maximum yield design point with the highest immunity against variations. Monte Carlo simulations show that the optimized bulk-MOS device for 45 nm gate length satisfies the on current and leakage constraints under a variability of up to 30% in the three parameters","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122906631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low Power Viterbi Decoder Implementation using Scarce State Transition and Path Pruning Scheme for High Throughput Wireless Applications","authors":"Jie Jin, C. Tsui","doi":"10.1145/1165573.1165673","DOIUrl":"https://doi.org/10.1145/1165573.1165673","url":null,"abstract":"This paper presents a low power Viterbi decoder design based on scarce state transition (SST). We propose an approach which seamlessly integrates the path pruning techniques with the SST decoding to reduce the average add-compare-select (ACS) computation. The scheme has very low overhead and is practical for implementation. We also propose an uneven-partitioned memory architecture for the survivor memory unit to reduce the memory access power during the trace back operation. The proposed decoder is implemented in SMIC 0.18mum CMOS process. Simulation results show that significant power consumption reduction can be achieved for high throughput wireless systems such as MB-OFDM ultra-wide-band applications","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116895458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temperature-Aware Floorplanning of Microarchitecture Blocks with IPC-Power Dependence Modeling and Transient Analysis","authors":"Vidyasagar Nookala, D. Lilja, S. Sapatnekar","doi":"10.1145/1165573.1165644","DOIUrl":"https://doi.org/10.1145/1165573.1165644","url":null,"abstract":"Operating temperatures have become an important concern in high performance microprocessors. Floorplanning or block-level placement offers excellent potential for thermal optimization through better heat spreading between the blocks, but these optimizations can also impact the throughput of a microarchitecture, measured in terms of the number of instructions per cycle (IPC). In nanometer technologies, global buses can have multicycle delays that depend on the positions of the blocks, and it is important for a floorplanner to be microarchitecturally-aware to be sure that thermal and IPC considerations are appropriately balanced. This paper proposes a methodology for thermally-aware microarchitecture floorplanning. The approach models the interactions between the IPC and the temperature distribution, and incorporates both factors in the floorplanning cost function. Our approach uses transient modeling and optimizes both the peak and the average temperatures, and employs a design of experiments (DOE) based strategy, which effectively captures the huge exponential search space with a small number of cycle-accurate simulations. A comparison with a technique based on previous work indicates that the proposed approach results in good reductions both in the average and the peak temperatures for a range of SPEC benchmarks","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123199120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stall Cycle Redistribution in a Transparent Fetch Pipeline","authors":"Eric L. Hill, Mikko H. Lipasti","doi":"10.1145/1165573.1165583","DOIUrl":"https://doi.org/10.1145/1165573.1165583","url":null,"abstract":"Power and power density are now primary design constraints for modern high performance microprocessors. Up to 70% of the dynamic power consumed can be attributed to the clocking system. A consequence of this trend is that clock gating has emerged as both a necessary and efficient method to significantly reduce dynamic power. Transparent pipelining, a recently proposed fine-grain clock gating technique, has the potential to significantly reduce clock power above and beyond conventional pipestage-level clock gating. Previous studies of transparent pipelining have focused on the circuit and implementation-related issues of this approach, while neglecting the broader microarchitectural implications. This paper aims to quantify the microarchitectural opportunities that are afforded by the use of transparent pipelining in a processor's fetch pipeline. We develop a technique, based on stall cycle redistribution, designed to improve the performance of transparent pipelining on fetch and other high utilization pipelines. We show that stall cycle redistribution can dramatically reduce the clocking overhead of an aggressively pipelined cell-like microprocessor","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122469229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Energy-Efficient Virtual Memory System with Flash Memory as the Secondary Storage","authors":"Hung-Wei Tseng, Han-Lin Li, Chia-Lin Yang","doi":"10.1145/1165573.1165675","DOIUrl":"https://doi.org/10.1145/1165573.1165675","url":null,"abstract":"The traditional virtual memory system is designed for decades assuming a magnetic disk as the secondary storage. Recently, flash memory becomes a popular storage alternative for many portable devices with the continuing improvements on its capacity, reliability and much lower power consumption than mechanical hard drives. The NAND flash memory is organized with blocks, and each block contains a set of pages. The characteristics of flash memory are quite different from a magnetic disk. Therefore, in this paper, we revisit virtual memory system design considering limitations imposed by flash memory. In particular, we study the effects of the subpaging technique and storage cache management. In the traditional virtual memory system, a full page is written back to the secondary storage on a page fault. We found that this could result in unnecessary writes thereby wasting energy. The subpaging technique that partitions a page into subunits, and only dirty subpages are written to flash memory is beneficial to the energy efficiency. For the storage cache management, unlike traditional disk cache management, care needs to be taken to guarantee that the flash pages of a main memory page are replaced from the cache in sequence. Experimental results show that the average energy reduction of combined subpaging and caching techniques is 35.6%","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130893619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Level Converter Design for Sub-threshold Logic","authors":"I. Chang, Jae-Joon Kim, K. Roy","doi":"10.1145/1165573.1165579","DOIUrl":"https://doi.org/10.1145/1165573.1165579","url":null,"abstract":"The large supply voltage difference between sub-threshold core logic and I/O makes it extremely challenging to convert signals from core circuit to I/O circuit. In this paper, we propose two novel circuits, clock synchronizer and reduced swing inverter to design dynamic and static level converters for sub-threshold logic. Circuit simulations shows that our level converters work at frequency > 500kHz between 20degC and 40degC with a supply voltage of 0.25V","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130373609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}