{"title":"A New Mismatch-Dependent Low Power Technique with Shadow Match-Line Voltage-Detecting Scheme for CAMs","authors":"Jianwei Zhang, Y. Ye, Bin-Da Liu","doi":"10.1145/1165573.1165605","DOIUrl":"https://doi.org/10.1145/1165573.1165605","url":null,"abstract":"A new mismatch-dependent low-power technique is presented for content-addressable memories (CAMs). With a novel shadow match-line voltage-detecting scheme, the word circuits realize fast self-disable of the charging paths in case of mismatches. Since the majority of CAMs words are mismatched, a significant power is reduced with a high search speed. Simulation results show the proposed 256-word times 144-bit ternary CAM, using 0.13-mum 1.2-V CMOS process, achieves 0.51 fJ/bit/search for the word circuit with less than 900 ps search time. The achievement illustrates a 77% energy-delay-product (EDP) reduction as compared to the speed-optimized current-saving scheme","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130223543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stall Cycle Redistribution in a Transparent Fetch Pipeline","authors":"Eric L. Hill, Mikko H. Lipasti","doi":"10.1145/1165573.1165583","DOIUrl":"https://doi.org/10.1145/1165573.1165583","url":null,"abstract":"Power and power density are now primary design constraints for modern high performance microprocessors. Up to 70% of the dynamic power consumed can be attributed to the clocking system. A consequence of this trend is that clock gating has emerged as both a necessary and efficient method to significantly reduce dynamic power. Transparent pipelining, a recently proposed fine-grain clock gating technique, has the potential to significantly reduce clock power above and beyond conventional pipestage-level clock gating. Previous studies of transparent pipelining have focused on the circuit and implementation-related issues of this approach, while neglecting the broader microarchitectural implications. This paper aims to quantify the microarchitectural opportunities that are afforded by the use of transparent pipelining in a processor's fetch pipeline. We develop a technique, based on stall cycle redistribution, designed to improve the performance of transparent pipelining on fetch and other high utilization pipelines. We show that stall cycle redistribution can dramatically reduce the clocking overhead of an aggressively pipelined cell-like microprocessor","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122469229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Level Converter Design for Sub-threshold Logic","authors":"I. Chang, Jae-Joon Kim, K. Roy","doi":"10.1145/1165573.1165579","DOIUrl":"https://doi.org/10.1145/1165573.1165579","url":null,"abstract":"The large supply voltage difference between sub-threshold core logic and I/O makes it extremely challenging to convert signals from core circuit to I/O circuit. In this paper, we propose two novel circuits, clock synchronizer and reduced swing inverter to design dynamic and static level converters for sub-threshold logic. Circuit simulations shows that our level converters work at frequency > 500kHz between 20degC and 40degC with a supply voltage of 0.25V","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130373609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Low Power Viterbi Decoder Implementation using Scarce State Transition and Path Pruning Scheme for High Throughput Wireless Applications","authors":"Jie Jin, C. Tsui","doi":"10.1145/1165573.1165673","DOIUrl":"https://doi.org/10.1145/1165573.1165673","url":null,"abstract":"This paper presents a low power Viterbi decoder design based on scarce state transition (SST). We propose an approach which seamlessly integrates the path pruning techniques with the SST decoding to reduce the average add-compare-select (ACS) computation. The scheme has very low overhead and is practical for implementation. We also propose an uneven-partitioned memory architecture for the survivor memory unit to reduce the memory access power during the trace back operation. The proposed decoder is implemented in SMIC 0.18mum CMOS process. Simulation results show that significant power consumption reduction can be achieved for high throughput wireless systems such as MB-OFDM ultra-wide-band applications","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116895458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Sarrafzadeh, F. Dabiri, R. Jafari, T. Massey, A. Nahapetian
{"title":"Low Power Light-weight Embedded Systems","authors":"M. Sarrafzadeh, F. Dabiri, R. Jafari, T. Massey, A. Nahapetian","doi":"10.1145/1165573.1165623","DOIUrl":"https://doi.org/10.1145/1165573.1165623","url":null,"abstract":"Light-weight embedded systems are now gaining more popularity due to the recent technological advances in fabrication that have resulted in more powerful tiny processors with greater communication capabilities that pose various scientific challenges for researchers. Perhaps the most significant challenge is the energy consumption concern and reliability, mainly due to the small size of batteries. In this tutorial, we portray a brief description of low-power, light-weight embedded systems, depict several power profiling studies previously conducted, and present several research challenges that require low-power consumption in embedded systems. For each challenge, we highlight how low-power designs may enhance the overall performance of the system. Finally, we present a several techniques that minimize the power consumption in such systems","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124436715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Process Variation Aware Cache Leakage Management","authors":"Ke Meng, R. Joseph","doi":"10.1145/1165573.1165636","DOIUrl":"https://doi.org/10.1145/1165573.1165636","url":null,"abstract":"In a few technology generations, limitations of fabrication processes have made accurate design time power estimates a daunting challenge. Static leakage current which comprises a significant fraction of total power due to large on-chip caches, is exponentially dependent on widely varying physical parameters such as gate length, gate oxide thickness, and dopant ion concentration. In large structures like on-chip caches, this may mean that one portion of a cache may consume an order of magnitude larger static power than equivalently sized regions. Under this climate, egalitarian management of physical resources is clearly untenable. In this paper, we analyze the effects of within-die and die-to-die leakage variation for on-chip caches. We then propose way prioritization, a manufacturing variation aware scheme that minimizes cache leakage energy. Our results show that significant average power reductions are possible without undue hardware complexity or performance compromise","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"20 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124544651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power Reduction in an H.264 Encoder Through Algorithmic and Logic Transformations","authors":"M. Koziri, G. Stamoulis, I. Katsavounidis","doi":"10.1145/1165573.1165598","DOIUrl":"https://doi.org/10.1145/1165573.1165598","url":null,"abstract":"The H.264 video coding standard can achieve considerably higher coding efficiency than previous video coding standards. The keys to this high coding efficiency are the two prediction modes (intra & inter) provided by H.264. Unfortunately, these result in a considerably higher encoder complexity that adversely affects speed and power, which are both significant for the mobile multimedia applications targeted by the standard. Therefore, it is of high importance to design architectures that minimize the speed and power overhead of the prediction modes. In this paper we present a new algorithm, and the logic transformations that enable it, that can replace the standard sum of absolute differences (SAD) approach in the two main prediction modes, and provide a power efficient hardware implementation without perceivable degradation in coding efficiency or video quality","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114728468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective Writeback: Exploiting Transient Values for Energy-Efficiency and Performance","authors":"D. Balkan, J. Sharkey, D. Ponomarev, K. Ghose","doi":"10.1145/1165573.1165584","DOIUrl":"https://doi.org/10.1145/1165573.1165584","url":null,"abstract":"Today's superscalar microprocessors use large, heavily-ported physical register files (RFs) to increase the instruction throughput. The high complexity and power dissipation of such RFs mainly stem from the need to maintain each and every result for a large number of cycles after the result generation. We observed that a significant fraction (about 45%) of the result values are delivered to their consumers via the bypass network (consumed \"on-the-fly\") and are never read out from the destination registers. In this paper, we first formulate conditions for identifying such transient values and describe their microarchitectural implementation; then we propose a technique to avoid the writeback of such transient values into the RF. With 64-entry integer and floating point register files, our technique achieves an 11% performance improvement and 29% reduction in the RF energy consumption compared to the baseline machine with the same number of registers. Furthermore, for the same performance target, the selective writeback scheme results in a 38% reduction in the energy consumption of the RF compared to the baseline machine","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126133933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Lu, N. Cao, L. Sigal, P. Woltgens, R. Robertazzi, D. Heidel
{"title":"A Pulsed Low-Voltage Swing Latch for Reduced Power Dissipation in High-Frequency Microprocessors","authors":"P. Lu, N. Cao, L. Sigal, P. Woltgens, R. Robertazzi, D. Heidel","doi":"10.1145/1165573.1165593","DOIUrl":"https://doi.org/10.1145/1165573.1165593","url":null,"abstract":"We have reported previously (Pong-Fei Lu et al., 2004) a low-swing latch (LSL) with superior performance-power tradeoff compared to the conventional pass-gate master-slave latch. In this paper, hardware results are presented for the proposed LSL with pulsed clock waveforms. The motivation is to combine low-voltage swing with pulsed signals to further reduce overall system power in high-frequency microprocessors. We have designed a 65-bit accumulator loop experiment to mimic a microprocessor pipeline stage. The local clock buffer design features a mode switch to toggle between two-phase (c1/c2) master-slave clocking and one-phase pulsed (c2 only) clocking. Our data show that 15-25% system power saving can be achieved in pulsed mode compared to non-pulsed mode. Power contribution from individual components is also presented","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125173432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy/Power Breakdown of Pipelined Nanometer Caches (90nm/65nm/45nm/32nm)","authors":"Samuel Rodríguez, B. Jacob","doi":"10.1145/1165573.1165581","DOIUrl":"https://doi.org/10.1145/1165573.1165581","url":null,"abstract":"As transistors continue to scale down into the nanometer regime, device leakage currents are becoming the dominant cause of power dissipation in nanometer caches, making it essential to model these leakage effects properly. Moreover, typical microprocessor caches are pipelined to keep up with the speed of the processor, and the effects of pipelining overhead need to be properly accounted for. In this paper, we present a detailed study of pipelined nanometer caches with detailed energy/power dissipation breakdowns showing where and how the power is dissipated within a nanometer cache. We explore a three-dimensional pipelined cache design space that includes cache size (16kB to 512kB), cache associativity (direct-mapped to 16-way) and process technology (90nm, 65nm, 45nm and 32nm). Among our findings, we show that cache bitline leakage is increasingly becoming the dominant cause of power dissipation in nanometer technology nodes. We show that subthreshold leakage is the main cause of static power dissipation, and that gate leakage is, surprisingly, not a significant contributor to total cache power, even for 32nm caches. We also show that accounting for cache pipelining overhead is necessary, as power dissipated by the pipeline elements is a significant part of cache power","PeriodicalId":119229,"journal":{"name":"ISLPED'06 Proceedings of the 2006 International Symposium on Low Power Electronics and Design","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124666362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}