{"title":"Leakage power analysis and reduction during behavioral synthesis","authors":"K. Khouri, N. Jha","doi":"10.1109/ICCD.2000.878342","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878342","url":null,"abstract":"This paper presents a high-level leakage power analysis and reduction algorithm. The algorithm uses device-level models for leakage to pre-characterize a given register-transfer level module library. This is used to estimate the power consumption of a circuit due to leakage. The algorithm can also identify and extract the frequently idle modules in the datapath, which may be targeted for low-leakage optimization. Leakage optimization is based on the use of dual threshold voltage (V/sub T/) technology. The algorithm prioritizes modules giving a high level synthesis (HLS) system an indication of where most gains for leakage reduction may be found. Results show that using a dual-V/sub T/ library during HLS can reduce leakage power by an average of 59% for the different technology generations. Total power can be reduced by an average of 18.8% to 45.4% for 0.18 /spl mu/m to 0.07 /spl mu/m technologies, respectively, compared to register-transfer level (RTL) circuits optimized for switching power only. The contribution of leakage power to overall power consumption of switching power optimized RTL circuits ranges from 23.5% to 54.1%. Our approach reduced these values to 11.4% to 25.9%.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective hardware-based two-way loop cache for high performance low power processors","authors":"T. Anderson, S. Agarwala","doi":"10.1109/ICCD.2000.878315","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878315","url":null,"abstract":"The increasing level of system-level integration coupled with the higher clock frequency of today's processors is increasing the power consumption of VLSI integrated circuits more rapidly than improvements in IC manufacturing can reduce power consumption. This paper presents a method for reducing the power consumption of DSP processors through the introduction of a two-way decoded loop-cache. By retaining decoded instruction information from two loops, the method has been shown to eliminate an average of 83% of instruction fetches and 84% of instruction decode activity.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117052351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predictive strategies for low-power RTOS scheduling","authors":"Pavan Kumar, M. Srivastava","doi":"10.1109/ICCD.2000.878306","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878306","url":null,"abstract":"Limiting the power consumption of real time embedded systems is an important aspect, especially in portable systems (laptops, cellular phones) with tight power constraints. In this paper, we present a power-saving prediction strategy that exploits the fixed priority scheduling of the real-time tasks running on these embedded systems. Power reduction is achieved by developing an efficient low power scheme with prediction of the expected execution time of real time tasks and making use of the idle time of system for scheduling these tasks in low power modes. In the process there may be few tasks missing their deadlines. This results in a tradeoff between power saved and deadlines missed. Our simulation results for different applications show that the proposed prediction mechanism achieves a high degree of power conservation with a very small penalty of missed deadlines. Our mechanism is simple and can be implemented in most of the real time operating systems.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"32 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129979713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of instruction stream buffer with trace support for X86 processors","authors":"J. Chiu, I. Huang, C. Chung","doi":"10.1109/ICCD.2000.878299","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878299","url":null,"abstract":"The potential performance of superscalar microprocessors can be exploited only when fed with sufficient instruction bandwidth. The front-end units, the instruction stream buffer and the fetcher, are the key elements achieving this goal. In most current processors, instruction stream buffers cannot support the instruction sequence beyond a basic block. The fetch rates are constrained by the branch barriers. In x86 processors, the split-line instruction problem worsens this constrain. We propose a design to improve instruction stream buffer performance by coupling it with BTB to support trace prediction. According to the simulation results of such an instruction stream buffer, the maximum fetch bandwidth can reach 8.42 x86 instructions per cycle. Furthermore, we suggest that the instruction stream buffer consists of two 64-bytes entries. Compared with other existing designs, this instruction stream buffer can improve performance by 90% over current x86 processor instruction fetching on average.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130091968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rectilinear block placement using B*-trees","authors":"G. Wu, Yun-Chih Chang, Yao-Wen Chang","doi":"10.1109/ICCD.2000.878307","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878307","url":null,"abstract":"Due to the layout complexity in deep sub-micron technology, integrated circuit blocks are often not rectangular. However, literature on general rectilinear block placement is still quite limited. In this paper, we present approaches for handling the placement for arbitrarily shaped rectilinear blocks, based on a newly developed data structure called B*-trees. Experimental results show that our algorithm achieves optimal or near optimal block placement for benchmarks with multiple shaped blocks.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126561359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rethinking behavioral synthesis for a better integration within existing design flows","authors":"W. Cesário, A. Jerraya, Z. Sugar, I. Moussa","doi":"10.1109/ICCD.2000.878330","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878330","url":null,"abstract":"Although very popular and largely wanted, behavioral synthesis was never widely accepted by designers. This paper analyzes the reasons for this failure and introduces a new generation of behavioral synthesis tools with more practical synthesis schemes. The main breakthrough of this new generation is the redefinition of the behavioral synthesis flow to better profit from the power of modern RTL and FSM synthesis. The synthesis results for two large design examples: a 2-million transistors ATM shaper and a motion estimator for a video codec (H261 standard) are shown. They illustrate the effectiveness of this new approach when compared with RT-level design methodologies.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132645924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient logic optimization using regularity extraction","authors":"Thomas Kutzschebauch","doi":"10.1109/ICCD.2000.878327","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878327","url":null,"abstract":"This paper presents a new method to extract functionally structures from logic netlists. It uses a fast regularity extraction algorithm based on structural equivalence. The goal of the proposed algorithm is the speedup of logic optimization of large circuits by reusing functionally equivalent structures of the design. It is particularly suited for circuits containing a large amount of datapaths. The regularity extraction algorithm uses an AND/XOR representation of the netlist to allow high correlation of functional and structural equivalence. It then extracts regular structures which can take any possible shape. The final optimization task is greatly reduced by optimizing only one copy of each regular structure while reusing the result for all other occurrences. In addition, structural regularity is widely preserved, resulting in higher packing density, shorter wiring length and improved delay during physical layout.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133774435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybridizing and coalescing load value predictors","authors":"Martin Burtscher, B. Zorn","doi":"10.1109/ICCD.2000.878272","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878272","url":null,"abstract":"Most well-performing load value predictors are hybrids that combine multiple predictors into one. Such hybrids are often large. To reduce their size and to improve their performance, this paper presents two storage reduction techniques as well as a detailed analysis of the interaction between a hybrid's components. We found that state sharing and simple value compression can shrink the size of a predictor by a factor of two without compromising the performance. Our component analysis revealed that combining well-performing predictors does not always yield a good hybrid, whereas sometimes a poor predictor can make an excellent complement to another predictor in a hybrid. Performance evaluations using a cycle-accurate simulator running SPECint95 show that hybridizing can improve non-hybrids by thirty to fifty percent over a wide range of sizes. With fifteen kilobytes of state, our coalesced-hybrid yields a harmonic mean speedup of twelve and fifteen percent with a re-fetch and a re-execute mis-prediction recovery mechanism, respectively, which is higher than the speedup of other predictors we evaluate, some of which are six times larger.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116252710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Equivalence checking combining a structural SAT-solver, BDDs, and simulation","authors":"Viresh Paruthi, A. Kuehlmann","doi":"10.1109/ICCD.2000.878323","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878323","url":null,"abstract":"This paper presents a verification technique for functional comparison of large combinational circuits using a novel combination of known approaches. The idea is based on a tight integration of a structural satisfiability (SAT) solver, BDD sweeping, and random simulation; all three working on a shared graph representation of the circuit. The BDD sweeping and SAT solver are applied in an inter-twined manner both controlled by resource limits that are successively increased during each iteration. In this cooperative setting the BDD sweeping incrementally reduces the search space for the SAT solver until the problem is solved or the resource limits are exhausted. This approach improves on previous work in several ways: The integral application of the SAT solver significantly enhances the capacity and efficiency of BDD sweeping and extends its suitability for miscomparing designs. Further, the random simulation algorithm works on the compressed circuit graph and thus runs more efficiently. Our experiments demonstrate that the outlined approach is effective for a large class of equivalence checking instances by automatically adapting to the difficulty of the problem.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123473684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabrizio Ferrandi, D. Sciuto, A. Fin, Franco Fummi
{"title":"An application of genetic algorithms and BDDs to functional testing","authors":"Fabrizio Ferrandi, D. Sciuto, A. Fin, Franco Fummi","doi":"10.1109/ICCD.2000.878268","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878268","url":null,"abstract":"This paper describes a functional level rest pattern generator, which combines two techniques: genetic algorithms (GAs) and binary decision diagrams (BDDs). The combined execution of such two techniques achieves better results for functional testing, than the single application of each separated technique. The entire set of functional errors is examined in a shorter time and a more compact test set is produced. The reason of this interesting result has been analyzed in the paper. It mainly depends on the fact that hard to detect errors for GA-based testing techniques are easy to detect than errors for BDD-based techniques and vice versa. The two testing approaches are thus complementary and can effectively cooperate.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122071727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}