{"title":"Mitigating inductive noise in SMT processors","authors":"W. El-Essawy, D. Albonesi","doi":"10.1109/LPE.2004.241160","DOIUrl":"https://doi.org/10.1109/LPE.2004.241160","url":null,"abstract":"Simultaneous multi-threading, although effective in increasing processor throughput, exacerbates the inductive noise problem such that more expensive electronic solutions are required even with the use of previously proposed microarchitectural approaches. We use detailed microarchitectural simulation together with the Pentium 4 power delivery model to demonstrate the impact of SMT on inductive noise, and to identify thread-specific microarchitectural reasons for high noise occurrences. We make the key observation that the presence of multiple threads actually provides an opportunity to mitigate the cyclical current fluctuations that cause noise, and propose the use of a prior performance enhancement technique to achieve this purpose.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114005723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balanced energy optimization","authors":"J. Cornish","doi":"10.1145/1023833.1023835","DOIUrl":"https://doi.org/10.1145/1023833.1023835","url":null,"abstract":"Summary form only given. Energy efficiency is now the number one issue for many applications, determining weight and cost, and constraining system performance. Many techniques have been developed to minimize the dynamic and static power consumed by digital designs without any impact on functionality. To achieve further savings it is necessary to employ methods that do constrain functionality in some way. The designer must then balance increased energy efficiency with the functional implications of those techniques. In communications systems non-zero error rates are accommodated and corrected in order to reduce power. In digital designs it is also possible to accept and correct errors generated when worst case timing paths exceed the clock interval. This allows the design to be operated beyond the worst case point at a reduced voltage to save energy. The increased energy efficiency must then be balanced against a decrease in determinism and the addition of error detection and correction structures. Processing scalability can also be employed to increase energy efficiency for workloads which vary dynamically. In single processor system this can be achieved using voltage and frequency scaling, and in multi-processor systems this can be supplemented with adaptive shutdown of unused processors. Scalability does imply a loss of system responsiveness when workloads transition from low to high levels, and this must be balanced against the increased energy efficiency achieved. Power efficiency can also be increased by optimising a processor for the application it is intended to run. By analyzing the algorithms to be executed it is possible to create a processor tailored to its workload. This loss of generality and flexibility must be balanced against the increased energy efficiency of a customized implementation. This talk describes work which ARM and its partners are doing to balance energy efficiency with functionality to create optimized designs.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114396711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of correlating caches","authors":"A. Mallik, M. Wildrick, G. Memik","doi":"10.1145/1013235.1013255","DOIUrl":"https://doi.org/10.1145/1013235.1013255","url":null,"abstract":"We introduce a new cache architecture that can be used to increase performance and reduce energy consumption in Network Processors. This new architecture is based on the observation that there is a strong correlation between different memory accesses. In other words, if load X and load Y are two consecutively executed load instructions, the offset between the source addresses of these instructions remain usually constant between different iterations. We utilize this information by building a correlating cache architecture. This architecture consists of a Dynamic Correlation Extractor, a Correlation History Table, and a Correlation Buffer. We first show simulation results investigating the frequency of correlating loads. Then, we evaluate our architecture using SimpleScalar/ARM. For a set of representative applications, the correlating cache architecture is able to reduce the average data access time by as much as 52.7% and 36.1/% on average, while reducing the energy consumption of the caches by as much as 49.2% and 25.7% on average.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115507249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrated adaptive DC/DC conversion with adaptive pulse-train technique for low-ripple fast-response regulation","authors":"Chuang Zhang, D. Ma, A. Srivastava","doi":"10.1145/1013235.1013301","DOIUrl":"https://doi.org/10.1145/1013235.1013301","url":null,"abstract":"Dynamic voltage scaling (DVS) is a very effective low-power design technique in modem digital IC systems. On-chip adaptive DC/DC converter, which provides adjustable output voltage, is a key component in implementing DVS-enabled system. This paper presents a new adaptive DC/DC converter design, which adopts a delay-line controller for voltage regulation. With a proposed adaptive pulse-train technique, ripple voltages are reduced by 50%, while the converter still maintains satisfying transient response. With a supply voltage of 3.3V, the output of the converter is well regulated from 1.7 to 3.0V. Power consumption of the controller is below 100 /spl mu/W. Maximum efficiency of 92% is achieved with output power of 125mW. Chip area is 0.8 /spl times/ 1.2mm/sup 2/ in 1.5 /spl mu/m standard CMOS process.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123061378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seokwoo Lee, Shidhartha Das, Toan Pham, T. Austin, D. Blaauw, T. Mudge
{"title":"Reducing pipeline energy demands with local DVS and dynamic retiming","authors":"Seokwoo Lee, Shidhartha Das, Toan Pham, T. Austin, D. Blaauw, T. Mudge","doi":"10.1145/1013235.1013313","DOIUrl":"https://doi.org/10.1145/1013235.1013313","url":null,"abstract":"The quadratic relationship between voltage and energy has made dynamic voltage scaling (DVS) one of the most powerful techniques to reduce system power demands. Recently, techniques such as Razor DVS, voltage overscaling, and intelligent energy management have emerged as approaches to further reduce voltage by eliminating costly voltage margins inserted into traditional designs to ensure always-correct operation. The degree to which a global voltage controller can shave voltage margins is limited by imbalances in pipeline stage latency. Since all pipeline stages share the same voltage, the stage exercising the longest critical path will define the overall voltage of the system, even if other stages could potentially run at lower voltages. In this paper, we evaluate two local tuning mechanisms in the context of Razor DVS, a local voltage controller scheme that allows each pipeline stage its own voltage level, and a lower cost dynamic retiming scheme that incorporates per-stage clock delay elements to allow longer-latency pipeline stages to \"borrow\" time from shorter-latency stages. Using simulation, we draw two key insights from our study. First, mitigating pipeline stage imbalances render additional DVS energy savings. A Razor pipeline design with dynamic retiming finds an additional 12% energy savings over global voltage control (resulting in overall energy savings of more than 28% compared to fully-margined DVS). Second, we demonstrate that imbalances arise not only from design factors, but also from run-time characteristics. As the program (or program phase) changes, we see different logic paths in multiple stages exercised frequently, necessitating a dynamic fine-tuning of local control. This result suggests that even well-balanced pipelines could benefit from dynamic retiming.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127238521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Technology exploration for adaptive power and frequency scaling in 90nm CMOS","authors":"M. Meijer, F. Pessolano, J. P. D. Gyvez","doi":"10.1145/1013235.1013245","DOIUrl":"https://doi.org/10.1145/1013235.1013245","url":null,"abstract":"In this paper we examine the expectations and limitations of design technologies such as adaptive voltage scaling (AVS) and adaptive body biasing (ABB) in a modem deep sub-micron process. To serve this purpose, a set of ring oscillators was fabricated in a 90nm triple-well CMOS technology. The analysis hereby presented is based on two ring oscillators running at 822MHz and 93MHz, respectively. Measurement results indicate that it is possible to reach 13.8/spl times/ power savings by 3.4/spl times/ frequency downscaling using AVS, /spl plusmn/11% power and /spl plusmn/8% frequency tuning at nominal conditions using ABB only, 22/spl times/ power savings with 5/spl times/ frequency downscaling by combining AVS and ABB, as well as 22/spl times/ leakage reduction.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127540200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-layout leakage power minimization based on distributed sleep transistor insertion","authors":"P. Babighian, L. Benini, A. Macii, E. Macii","doi":"10.1145/1013235.1013275","DOIUrl":"https://doi.org/10.1145/1013235.1013275","url":null,"abstract":"This paper introduces a new approach to sub-threshold leakage power reduction in CMOS circuits. Our technique is based on automatic insertion of sleep transistors for cutting sub-threshold current when CMOS gates are in stand-by mode. Area and speed overhead caused by sleep transistor insertion are tightly controlled thanks to: (i) a post-layout incremental modification step that inserts sleep transistors in an existing row-based layout; (ii) an innovative algorithm that selects the subset of cells that can be gated for maximal leakage power reduction, while meeting user-provided constraints on area and delay increase. The presented technique is highly effective and fully compatible with industrial back-end flows, as demonstrated by post-layout analysts on several benchmarks placed and routed with state-of-the art commercial tools for physical design.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121882980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chanik Park, Jeong-Uk Kang, Seon-Yeong Park, Jinsoo Kim
{"title":"Energy-aware demand paging on NAND flash-based embedded storages","authors":"Chanik Park, Jeong-Uk Kang, Seon-Yeong Park, Jinsoo Kim","doi":"10.1145/1013235.1013317","DOIUrl":"https://doi.org/10.1145/1013235.1013317","url":null,"abstract":"The ever-increasing requirement for high-performance and huge-capacity memories of emerging embedded applications has led to the widespread adoption of SDRAM and NAND flash memory as main and secondary memories, respectively. In particular, the use of energy consuming memory, SDRAM, has become burdensome in battery-powered embedded systems. Intuitively, though demand paging can be used to mitigate the increasing requirement of main memory size, its applicability should be deliberately elaborated since NAND flash memory has asymmetric operation characteristics in terms of performance and energy consumption. In this paper, we present an energy-aware demand paging technique to lower the energy consumption of embedded systems considering the characteristics of interactive embedded applications with large memory footprints. We also propose a flash memory-aware page replacement policy that can reduce the number of write and erase operations in NAND flash memory. With real-life workloads, we show the system-wide energy-delay product can be reduced by 15/spl sim/30% compared to the traditional shadowing architecture.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121668247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing standby and active mode leakage power in deep sub-micron design","authors":"L. Clark, Rakesh J. Patel, T. Beatty","doi":"10.1145/1013235.1013239","DOIUrl":"https://doi.org/10.1145/1013235.1013239","url":null,"abstract":"Scaling has allowed rising transistor counts per die and increases leakage at an exponential rate, making power a primary constraint in all integrated circuit designs. Future designs must address emerging leakage components due to direct band to band tunneling, through MOSFET oxides and at steep junction doping gradients. In this paper, we describe circuit design techniques for managing leakage power, both during standby and for limiting the leakage power contribution during active operation. The efficacy, design effort, and process ramifications of different approaches are examined. The schemes are primarily aimed at hand-held devices such as cell phones, since the needs for low power are most acute in these markets due to limited battery capacity.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114020602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Location cache: a low-power L2 cache system","authors":"Rui Min, W. Jone, Yimin Hu","doi":"10.1145/1013235.1013271","DOIUrl":"https://doi.org/10.1145/1013235.1013271","url":null,"abstract":"While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which significantly reduces the power consumption for large set-associative caches. We propose to use a small cache, called location cache to store the location of future cache references. If there is a hit in the location cache, the supported cache is accessed as a direct-mapped cache. Otherwise, the supported cache is referenced as a conventional set-associative cache. The worst case access latency of the location cache system is the same as that of a conventional cache. The location cache is virtually indexed so that operations on it can be performed in parallel with the TLB address translation. These advantages make it ideal for L2 cache systems where traditional way-predication strategies perform poorly. We used the CACTI cache model to evaluate the power consumption and access latency of proposed cache architecture. Simplescalar CPU simulator was used to produce final results. It is shown that the proposed location cache architecture is power-efficient. In the simulated cache configurations, up-to 47% of cache accessing energy and 25% of average cache access latency can be reduced.","PeriodicalId":120002,"journal":{"name":"Proceedings of the 2004 International Symposium on Low Power Electronics and Design (IEEE Cat. No.04TH8758)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133881562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}