{"title":"Power-optimal pipelining in deep submicron technology","authors":"Seongmoo Heo, K. Asanović","doi":"10.1145/1013235.1013291","DOIUrl":"https://doi.org/10.1145/1013235.1013291","url":null,"abstract":"This paper explores the effectiveness of pipelining as a power saving tool, where the reduction in logic depth per stage is used to reduce supply voltage at a fixed clock frequency. We examine power-optimal pipelining in deep submicron technology, both analytically and by simulation. Simulation uses a 70 nm predictive process with a fanout-of-four inverter chain model including input/output flipflops, and results are shown to match theory well. The simulation results show that power-optimal logic depth is 6 to 8 FO4 and optimal power saving varies from 55 to 80% compared to a 24 FO4 logic depth, depending on threshold voltage, activity factor, and presence of clock-gating. We decompose the power consumption of a circuit into three components, switching power, leakage power, and idle power, and present the following insights into power-optimal pipelining. First, power-optimal logic depth decreases and optimal power savings increase for larger activity factors, where switching power dominates over leakage and idle power. Second, pipelining is more effective with lower threshold voltages at high activity factors, but higher threshold voltages give better results at lower activity factors where leakage current dominates. Lastly, clock-gating enables deeper pipelining and more power saving because it reduces timing element overhead when the activity factor is low.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122766830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SEPAS: A highly accurate energy-efficient branch predictor","authors":"A. Baniasadi, Andreas Moshovos","doi":"10.1145/1013235.1013250","DOIUrl":"https://doi.org/10.1145/1013235.1013250","url":null,"abstract":"Designers have invested much effort in developing accurate branch predictors with short learning periods. Such techniques rely on exploiting complex and relatively large structures. Although exploiting such structures is necessary to achieve high accuracy and fast learning, once the short learning phase is over, a simple structure can efficiently predict the branch outcome for the majority of branches. Moreover, for a large number of branches, once the branch reaches the steady state phase, updating the branch predictor unit is unnecessary since there is already enough information available to the predictor to predict the branch outcome accurately. Therefore, aggressive usage of complex large branch predictors appears to be inefficient since it results in unnecessary energy consumption. In this work we introduce Selective Predictor Access (SEPAS) to exploit this design inefficiency. SEPAS uses a simple power efficient structure to identify well behaved branch instructions that are in their steady state phase. Once such branches are identified, the predictor is no longer accessed to predict their outcome or to update the associated data. We show that it is possible to reduce the number of predictor accesses and energy consumption considerably with a negligible performance loss (worst case 0.25%).","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123629961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-power asynchronous Viterbi decoder for wireless applications","authors":"Mohamed Kawokgy, C. Salama","doi":"10.1145/1013235.1013306","DOIUrl":"https://doi.org/10.1145/1013235.1013306","url":null,"abstract":"This paper describes the implementation of an asynchronous 64-state, 1/2-rate Viterbi decoder using an original architecture and design methodology. The decoder is intended for wireless communications applications, where bit rates over 100 Mb/s and minimum power consumption are sought. The choice of an asynchronous design was predicated by the power and speed advantages of such a methodology. Asynchronous designs are inherently data driven and are active only when doing useful work, enabling considerable savings in power and operating at the average speed of all components. The decoder, implemented in a 0.18 /spl mu/m CMOS technology, occupies an area of 2 mm/sup 2/ and operates above 200 Mb/s while consuming 85 mW: a 55% power reduction when compared to state of the art synchronous design implemented in a 0.25 /spl mu/m technology.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132008303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}