{"title":"REMiS: Run-time energy minimization scheme in a reconfigurable processor with dynamic power-gated instruction set","authors":"M. Shafique, L. Bauer, J. Henkel","doi":"10.1145/1687399.1687411","DOIUrl":"https://doi.org/10.1145/1687399.1687411","url":null,"abstract":"Reconfigurable processors provide a means to flexible and energy-aware computing. In this paper, we present a new scheme for runtime energy minimization (REMiS) as part of a dynamically recon-figurable processor that is exposed to run-time varying constraints like performance and footprint (i.e. amount of reconfigurable fabric). The scheme chooses an energy-minimizing set of so-called Special Instructions (considering leakage, dynamic, and reconfiguration energy) and then 'power-gates' a temporarily unused subset of the Special Instruction set. We provide a comprehensive evaluation for different technologies (ranging from 65 nm to 150 nm) and thereby show that our scheme is technology independent, i.e. it is beneficial for various technologies alike. By means of an H.264 video encoder we demonstrate that for certain performance constraints our scheme (applied to our in-house reconfigurable processor) achieves an allover energy saving of up to 40.8% (avg. 24.8%) compared to a performance-maximizing scheme. We also demonstrate that our scheme is equally beneficial to various other state-of-the-art reconfigurable processor architectures like Molen where it achieves energy savings of up to 48.7% (avg. 28.93%) at 65 nm. We have employed an H.264 encoder within this paper as an application in order to demonstrate the strengths of our scheme, since the H.264's complexity and run-time unpredictability present a challenging scenario for state-of-the-art architectures.","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115308002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intrinsic NBTI-variability aware statistical pipeline performance assessment and tuning","authors":"B. Vaidyanathan, A. Oates, Yuan Xie","doi":"10.1145/1687399.1687429","DOIUrl":"https://doi.org/10.1145/1687399.1687429","url":null,"abstract":"Random process variation and variability intrinsic to PMOS Negative Bias Temperature Instability (NBTI-induced statistical variation) are two major reliability concerns as transistor dimensions scales with technology. Previous works have studied these two sources of variation separately at device and circuit level. We study the impact of the interaction between intrinsic PMOS NBTI variability and time process variability on circuit delay spread. A statistical pipeline timing error model is proposed including both the variability sources to predict its impact on pipeline stage count. It is shown that a wide difference in statistical timing response to intrinsic NBTI variability exists among different circuits. Traditional design time NBTI-aware delay guard-banding is proved to be statistically insufficient in pipelines and an excess of 2x guard-band needs to be incorporated at the end of 10 years. However, the guard-band is shown to be reduced by 30% when the dynamic cycle time stealing technique is employed.","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127312866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Murthy Palla, J. Bargfrede, Stephan Eggersglüß, W. Anheier, R. Drechsler
{"title":"Timing Arc based logic analysis for false noise reduction","authors":"Murthy Palla, J. Bargfrede, Stephan Eggersglüß, W. Anheier, R. Drechsler","doi":"10.1145/1687399.1687440","DOIUrl":"https://doi.org/10.1145/1687399.1687440","url":null,"abstract":"The problem of calculating accurate impact of crosstalk on a circuit considering its inherent logic and timing properties is very complex. Although it has been widely studied, it still lacks an efficient solution. As a result, state-of-the-art crosstalk calculators use simplistic and overly pessimistic models resulting in the over-estimation of crosstalk effects. Such pessimism in crosstalk analysis often leads to the triggering of false violations and consequently an inefficient use of design resources. The main contribution of this paper is a novel technique called Timing Arc Based Logic Analysis (TABLA) that serves as an efficient means to calculate realistic crosstalk bounds. TABLA uses timing arcs as basic elements to perform an efficient temporal logic analysis employing the min-max timing model using dedicated solvers for logic and timing. Additionally, a procedure to generate powerful conflict clauses is proposed to improve the run time of the overall analysis. The proposed technique has been tested in an industrial environment on benchmark circuits as well as on an industrial design, and results are provided.","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127599966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GHM: A generalized Hamiltonian method for passivity test of impedance/admittance descriptor systems","authors":"Zheng Zhang, Chi-Un Lei, N. Wong","doi":"10.1145/1687399.1687541","DOIUrl":"https://doi.org/10.1145/1687399.1687541","url":null,"abstract":"A generalized Hamiltonian method (GHM) is proposed for passivity test of descriptor systems (DSs) which describe impedance or admittance input-output responses. GHM can test passivity of DSs with any system index without minimal realization. This frequency-independent method can avoid the time-consuming system decomposition as required in many existing DS passivity test approaches. Furthermore, GHM can test systems with singular D + DT where traditional Hamiltonian method fails, and enjoys a more accurate passivity violation identification compared to frequency sweeping techniques. Numerical results have verified the effectiveness of GHM. The proposed method constitutes a versatile tool to speed up passivity check and enforcement of DSs and subsequently ensures globally stable simulations of electrical circuits and components.","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127054612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Voltage binning under process variation","authors":"V. Zolotov, C. Visweswariah, Jinjun Xiong","doi":"10.1145/1687399.1687480","DOIUrl":"https://doi.org/10.1145/1687399.1687480","url":null,"abstract":"Process variation is recognized as a major source of parametric yield loss, which occurs because a fraction of manufactured chips do not satisfy timing or power constraints. On the other hand, both chip performance and chip leakage power depend on supply voltage. This dependence can be used for converting the fraction of too slow or too leaky chips into good ones by adjusting their supply voltage. This technique is called voltage binning. All the manufactured chips are divided into groups (bins) and each group is assigned its individual supply voltage. This paper proposes a statistical technique of yield computation for different voltage binning schemes using results of statistical timing and variational power analysis. The paper formulates and solves the problem of computing optimal supply voltages for a given binning scheme.","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128052021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"POWER7 — Verification challenge of a multi-core processor","authors":"Klaus-Dieter Schubert","doi":"10.1145/1687399.1687551","DOIUrl":"https://doi.org/10.1145/1687399.1687551","url":null,"abstract":"Over the years functional hardware verification has made significant progress in the areas of traditional simulation techniques, hardware accelerator usage and last but not least formal verification approaches. This has been sufficient to deal with the additional design content and complexity increase that has been happening at the same time. For POWER7, IBM's first high end 8-core microprocessor, these incremental improvements in verification have been deemed not to be enough by themselves, because the chip was not just a remap of an existing design with more cores. The infrastructure on the chip had to be changed significantly, while at the same time the business side requested a shorter development cycle with perfect quality but without growing the team. Looking at these constraints a two phase approach seemed to be the only solution. This paper commences with the highlights of the first phase, where improvements to the existing process have been identified. This includes topics ranging from enhanced test case generation, over advancements in structural checking to the extensions of the formal verification scope both in property checking and sequential equivalence checking. At the same time, the paper describes the second phase which has targeted the exploitation of synergy across the various verification activities. The active interlock between simulation, formal verification and the design has helped to reduce workload and improved the project schedule. And the usage of coverage in holistic way from unit level simulation to acceleration has led to new innovations and new insight, which improved the overall verification process. Finally, an outlook on future challenges and future trends is given. Categories and Subject Descriptors B.6.3 [Logic Design]: Design Aids — Verification. General Terms Verification","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126665414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel multi-level analytical global placement on graphics processing units","authors":"J. Cong, Yi Zou","doi":"10.1145/1687399.1687525","DOIUrl":"https://doi.org/10.1145/1687399.1687525","url":null,"abstract":"GPU platforms are becoming increasingly attractive for implementing accelerators because they feature a larger number of cores with improved programmability. In this paper, we describe our implementation of a state-of-the-art academic multi-level analytical placer mPL on Nvidia's massively parallel GT200 series platforms. We detail our efforts on performance tuning and optimizations. When compared to software implementation on Intel's recent generation Xeon CPU, the speed of the global placement part of mPL is 15× faster on average using a Tesla C1060 card, with comparable WL. (less than 1% WL degradation on average).","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"244 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131435523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computing quadratic approximations for the isochrons of oscillators: A general theory and advanced numerical methods","authors":"O. Suvak, A. Demir","doi":"10.1145/1687399.1687475","DOIUrl":"https://doi.org/10.1145/1687399.1687475","url":null,"abstract":"We first review the notion of isochrons for oscillators, which has been developed and heavily utilized in mathematical biology in studying biological oscillations. Isochrons were instrumental in introducing a notion of generalized phase for an oscillation and form the basis for oscillator perturbation analysis formulations. Calculating the isochrons of an oscillator is a very difficult task. Except for some very simple planar oscillators, isochrons can not be calculated analytically and one has to resort to numerical techniques. Previously proposed numerical methods for computing isochrons can be regarded as brute-force, which become totally impractical for non-planar oscillators with dimension more than two. In this paper, we present a precise and carefully developed theory and advanced numerical techniques for computing local but quadratic approximations for isochrons. Previous work offers the theory and the numerical methods needed for computing only linear approximations for isochrons. Our treatment is general and applicable to oscillators with large dimension. We present examples for isochron computations, verify our results against exact calculations in a simple case, and allude to several applications among many where quadratic approximations of isochrons will be of use.","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"274 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131671325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DynaTune: Circuit-level optimization for timing speculation considering dynamic path behavior","authors":"Lu Wan, Deming Chen","doi":"10.1145/1687399.1687430","DOIUrl":"https://doi.org/10.1145/1687399.1687430","url":null,"abstract":"Traditional circuit design focuses on optimizing the static critical paths no matter how infrequently these paths are exercised dynamically. Circuit optimization is then tuned to the worst-case conditions to guarantee error-free computation but may also lead to very inefficient designs. Recently, there are processor works that over-clock the chip to achieve higher performance to the point where timing errors occur, and then error correction is performed either through circuit-level or microarchitecture-level techniques. This approach in general is referred to as Timing Speculation. In this paper, we propose a new circuit optimization technique \"DynaTune\" for timing speculation based on the dynamic behavior of a circuit. DynaTune optimizes the most dynamically critical gates of a circuit and improves the circuit's throughput under a fixed power budget. We test this proposed technique with two timing speculation schemes — Telescopic Unit (TU) and Razor Logic (RZ). Experimental results show that applying DynaTune on the Leon3 processor can increase the throughput of critical modules by up to 13% and 20% compared to the timing-speculative and non-timing-speculative results optimized by Synopsys Design Compiler, respectively. For MCNC benchmark circuits, DynaTune combined with TU can provide 9% and 20% throughput gains on average compared to timing-speculative and non-timing-speculative results optimized by Design Compiler. When combined with RZ, DynaTune can achieve 8% and 15% throughput gains on average for above experiments. Categories and Subject Descriptors B.6.3 [Hardware]: Design Aids — Optimization. General Terms Algorithms, Performance, Design, Experimentation","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130441392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Panagiotis Athanasopoulos, P. Brisk, Y. Leblebici, P. Ienne
{"title":"Memory organization and data layout for instruction set extensions with architecturally visible storage","authors":"Panagiotis Athanasopoulos, P. Brisk, Y. Leblebici, P. Ienne","doi":"10.1145/1687399.1687527","DOIUrl":"https://doi.org/10.1145/1687399.1687527","url":null,"abstract":"Present application specific embedded systems tend to choose instruction set extensions (ISEs) based on limitations imposed by the available data bandwidth to custom functional units (CFUs). Adoption of the optimal ISE for an application would, in many cases, impose formidable cost increase in order to achieve the required data bandwidth. In this paper we propose a novel methodology for laying out data in memories, generating high-bandwidth memory systems by making use of existing low-bandwidth low-cost ones and designing custom functional units all with the desirable data bandwidth for only a fraction of the additional cost required by traditional techniques.","PeriodicalId":256358,"journal":{"name":"2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116612200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}