{"title":"Characterizing chip-multiprocessor variability-tolerance","authors":"S. Herbert, Diana Marculescu","doi":"10.1145/1391469.1391550","DOIUrl":"https://doi.org/10.1145/1391469.1391550","url":null,"abstract":"Spatially-correlated intra-die process variations result in significant core-to-core frequency variations in chip-multiprocessors. An analytical model for frequency island chip-multiprocessor throughput is introduced. The improved variability-tolerance of FI-CMPs over their globally-clocked counterparts is quantified across a range of core counts and sizes under constant die area. The benefits are highest for designs consisting of many small cores, with the throughput of a globally-clocked design with 70 small cores increasing by 8.8% when per-core frequency islands are used. The small- core FI-CMP also loses only 7.2% of its nominal performance to process variations, the least among any of the designs.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128314585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why should we do 3D integration?","authors":"W. Haensch","doi":"10.1145/1391469.1391643","DOIUrl":"https://doi.org/10.1145/1391469.1391643","url":null,"abstract":"3D integration offers a technology that meets the requirements of the current trend in high performance microprocessors. It offers the opportunity to continue the performance trends the industry enjoyed in the past. To take advantage of this opportunity system architecture and design needs to utilize the new possibilities that 3D integration provides.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129842814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring locking & partitioning for predictable shared caches on multi-cores","authors":"Vivy Suhendra, T. Mitra","doi":"10.1145/1391469.1391545","DOIUrl":"https://doi.org/10.1145/1391469.1391545","url":null,"abstract":"Multi-core architectures consisting of multiple processing cores on a chip have become increasingly prevalent. Synthesizing hard realtime applications onto these platforms is quite challenging, as the contention among the cores for various shared resources leads to inherent timing unpredictability. This paper proposes the use of shared cache in a predictable manner through a combination of locking and partitioning mechanisms. We explore possible design choices and evaluate their effects on the worst-case application performance. Our study reveals certain design principles that strongly dictate the performance of a predictable memory hierarchy.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121293692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic modeling of a thermally-managed multi-core system","authors":"Hwisung Jung, Peng Rong, Massoud Pedram","doi":"10.1145/1391469.1391657","DOIUrl":"https://doi.org/10.1145/1391469.1391657","url":null,"abstract":"Achieving high performance under a peak temperature limit is a first-order concern for VLSI designers. This paper presents a new abstract model of a thermally-managed system, where a stochastic process model is employed to capture the system performance and thermal behavior. We formulate the problem of dynamic thermal management (DTM) as the problem of minimizing the energy cost of the system for a given level of performance under a peak temperature constraint by using a controllable Markovian decision process (MDP) model. The key rationale for utilizing MDP for solving the DTM problem is to manage the stochastic behavior of the temperature states of the system under online re-configuration of its micro-architecture and/or dynamic voltage-frequency scaling. Experimental results demonstrate the effectiveness of the modeling framework and the proposed DTM technique.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"317 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116364670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Signature based Boolean matching in the presence of don’t cares","authors":"A. Abdollahi","doi":"10.1145/1391469.1391635","DOIUrl":"https://doi.org/10.1145/1391469.1391635","url":null,"abstract":"Boolean matching is to determine whether two functions are equivalent under input permutation and input/output phase assignment. This paper will address the Boolean Matching problem for incompletely specified functions. Signatures have previously been used for Boolean matching of completely specified functions. In this paper for the first time we use signatures to determine the equivalency of incompletely specified functions.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116502608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges in gate level modeling for delay and SI at 65nm and below","authors":"Igor Keller, K. Tam, Vinod Kariat","doi":"10.1145/1391469.1391590","DOIUrl":"https://doi.org/10.1145/1391469.1391590","url":null,"abstract":"In this paper we review the prior art and recent advances in the area of standard cell modeling for delay and noise analyses, suggest a taxonomy of different cell models, and discuss their strengths and weaknesses. We also discuss challenges in cell modeling for delay and noise analyses arising in new submicron process nodes.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"361 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121642834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SHIELD: A software hardware design methodology for security and reliability of MPSoCs","authors":"K. Patel, S. Parameswaran","doi":"10.1145/1391469.1391686","DOIUrl":"https://doi.org/10.1145/1391469.1391686","url":null,"abstract":"Security of MPSoCs is an emerging area of concern in embedded systems. Security is jeopardized by code injection attacks, which are the most common types of software attacks. Previous attempts to detect code injection in MPSoCs have been burdened with significant performance overheads. In this work, we present a hardware/software methodology \"SHIELD\" to detect code injection attacks in MPSoCs. SHIELD instruments the software programs running on application processors in the MPSoC and also extracts control flow and basic block execution time information for runtime checking. We employ a dedicated security processor (monitor processor) to supervise the application processors on the MPSoC. Custom hardware is designed and used in the monitor and application processors. The monitor processor uses the custom hardware to rapidly analyze information communicated to it from the application processors at runtime. We have implemented SHIELD on a commercial extensible processor (Xtensa LX2) and tested it on a multiprocessor JPEG encoder program. In addition to code injection attacks, the system is also able to detect 83% of bit flips errors in the control flow instructions. The experiments show that SHIELD produces systems with runtime which is at least 9 times faster than the previous solution. SHIELD incurs a runtime (clock cycles) performance overhead of only 6.6% and an area overhead of 26.9%, when compared to a non-secure system.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128141197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A practical reconfigurable hardware accelerator for boolean satisfiability solvers","authors":"John D. Davis, Zhangxi Tan, Fang Yu, Lintao Zhang","doi":"10.1145/1391469.1391669","DOIUrl":"https://doi.org/10.1145/1391469.1391669","url":null,"abstract":"We present a practical FPGA-based accelerator for solving Boolean Satisfiability problems (SAT). Unlike previous efforts for hardware accelerated SAT solving, our design focuses on accelerating the most time consuming part of the SAT solver - Boolean Constraint Propagation (BCP), leaving the choices of heuristics such as branching order, restarting policy, and learning and backtracking to software. Our novel approach uses an application-specific architecture instead of an instance-specific one to avoid time-consuming FPGA synthesis for each SAT instance. By avoiding global signal wires and carefully pipelining the design, our BCP accelerator is able to achieve much higher clock frequency than that of previous work. In addition, it can load SAT instances in milliseconds, can handle SAT instances with tens of thousands of variables and clauses using a single FPGA, and can easily scale to handle more clauses by using multiple FPGAs. Our evaluation on a cycle-accurate simulator shows that the FPGA co-processor can achieve 3.7-38.6x speedup on BCP compared to state-of-the-art software SAT solvers.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"81 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128147189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated design of self-adjusting pipelines","authors":"Jieyi Long, S. Memik","doi":"10.1145/1391469.1391523","DOIUrl":"https://doi.org/10.1145/1391469.1391523","url":null,"abstract":"We propose a self-adjusting pipeline structure to enhance chip performance and robustness considering the effects of process variations. We achieve this by introducing delay sensors to monitor internal timing violations within a pipeline stage and variable clock skew buffers to adjust the timing of the pipeline stage based on the feedback from the delay sensors. Furthermore, we formulate the delay sensor insertion and variable clock skew configuration problem as a stochastic mixed-integer programming problem and propose a simulated-annealing based algorithm to solve it. A comparison between the designs with and without the self-adjusting enhancement reveals that, we are able to improve the average performance of a batch of chips by 9.5%.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125595186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Run-time instruction set selection in a transmutable embedded processor","authors":"L. Bauer, M. Shafique, J. Henkel","doi":"10.1145/1391469.1391486","DOIUrl":"https://doi.org/10.1145/1391469.1391486","url":null,"abstract":"We are presenting a new concept of an application-specific processor that is capable of transmuting its instruction set according to non-predictive application behavior during run-time. In those scenarios, current (extensible) embedded processors are less efficient since they are not run-time adaptive. We have identified the instruction set selection to be a critical step to perform at run time and hence we focus this paper on that crucial part. Our paradigm conducts as many steps as possible at compile/design time and as little as necessary at run time with the constraint to provide a sufficient flexibility to react to non-predictive application behavior efficiently We provide an in-depth analysis of our scheme and achieve a speed-up of up to 7.19times (average: 3.63times) compared to state-of-the-art adaptive approaches (like [19]). As an application, we have employed a whole H.264 video encoder though our scheme is by principle applicable to many other embedded applications. Our results are evaluated by an implementation of the instruction set selection for our transmutable processor on an FPGA platform.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130230970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}