{"title":"Incremental gate sizing for late process changes","authors":"John Lee, Puneet Gupta","doi":"10.1109/ICCD.2010.5647778","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647778","url":null,"abstract":"Circuit design often runs in parallel with the development of the manufacturing process that will be used to fabricate it. However, as the manufacturing process matures, its models may undergo substantial changes as the design nears production. These changes may cause the design itself to fail its specifications, and in these cases it is necessary to perform an Engineering Change Order (ECO) to correct these problems. We present a new framework to perform incremental gate sizing for process changes late in the design cycle. This includes a method to measure and estimate ECO cost, transform these costs into a linear programming optimization problem, and solve the problem to find the ECO. This method performs well, compared to a leading commercial physical design tool, reducing ECO costs by 18% to 99% in changed area, and 1% to 96% in number of pins with unnecessary pin timing changes.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116784271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. B. V. D. Santos, Tiago Reimann, M. Johann, R. Reis
{"title":"The Fidelity Property of the Elmore Delay Model in actual comparison of routing algorithms","authors":"G. B. V. D. Santos, Tiago Reimann, M. Johann, R. Reis","doi":"10.1109/ICCD.2010.5647789","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647789","url":null,"abstract":"Despite the existence of several other alternatives for estimating delay of interconnects, the Elmore Delay Model still has been used for comparison of routing algorithms. The criterion used to establish Elmore's model as a confident metric for this purpose is the so-called Fidelity Property. In this work we investigate the Fidelity Property using nowadays interconnect parameters, in four routing scopes. For the first time the Fidelity is evaluated in actual algorithms comparison, one of the main utilities it was established for. What is found is that the original methodology used to evaluate this property hides a significant standard deviation. This standard deviation strongly impacts the capacity of Elmore's model to provide good certainty of choosing the best routing solutions among several ones. Additionally, the experiments of algorithms comparison show that different routing alternatives are appropriated for different routing scopes, with respect to metal layers, driver strengths and routing areas.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121944778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Delay test quality maximization through process-aware selection of test set size","authors":"B. Arslan, A. Orailoglu","doi":"10.1109/ICCD.2010.5647687","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647687","url":null,"abstract":"The quality of a delay test set hinges not only on test patterns and the distribution of the delay defects but on the variations in process parameters as well. Process variations result in the same delay test set displaying differences from die to die in the detection of particular delay defects at the identical circuit node. The application of an identical test set to all devices independent of process variations consequently results in delivering inefficiencies in test time utilization. This paper proposes a delay test technique that adaptively changes the size of the test set based on the position of the device in the process variation space in order to maximize test quality within a given test time.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129253481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing dual-Vt design with consideration of on-chip temperature variation","authors":"J. Gu, G. Qu, Lin Yuan","doi":"10.1109/ICCD.2010.5647619","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647619","url":null,"abstract":"dual-Vt technology is effective in leakage reduction and has been implemented in industry EDA tools. However, on-chip temperature is regarded as uniformly distributed over the chip, with a pre-assumed value. This assumption does not hold for designs in the deep sub-micron domain as on-chip temperature variation becomes more and more significant. As a result, treating temperature as a constant will either lead to non-optimal design in terms of leakage or unreliable circuit due to potential hot spots that have temperature higher than expected. In this paper, we propose a temperature-aware approach that leverages the on-chip temperature variation and takes into account the coupling effects between leakage and temperature to enhance the leakage reduction of any dual-Vt assignment algorithm. We synthesize and implement Opencore benchmarks using Synopsys tools and TSMC's 65nm low power dual-Vt library. The results show that we are able to improve the performance of a state-of-the-art dual Vt algorithm by an average of 11.2% in leakage saving, a more than 1.4°C drop of peak temperature, and a significant reduction of cells in hot regions without timing failure.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133436141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting the throughput of multiprocessor applications under dynamic workload","authors":"P. Poplavko, M. Geilen, T. Basten","doi":"10.1109/ICCD.2010.5647740","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647740","url":null,"abstract":"This work contributes to throughput calculation for real-time multiprocessor applications experiencing dynamic workload variations. We focus on a method to predict the system throughput when processing an arbitrarily long data frame given the meta-characteristics of the workload in that frame. This is useful for different purposes, such as resource allocation or dynamic voltage scaling in embedded systems. An accurate enough analysis is not trivial when two factors are combined: parallelism and dynamic workload variations. In earlier work, two analysis methods showed good accuracy for several application examples, but no comparative experiments were carried out. In this work, we contribute new propositions to the theoretical basis of the previous methods. Based on these propositions, we remove a potential problem in a common subroutine and propose a new analysis method.We compare the methods experimentally. The new method provides a significant reduction of the throughput prediction error, up to 12%.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131925665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unified addition structure for moduli set {2n−1, 2n, 2n+1} based on a novel RNS representation","authors":"S. Timarchi, M. Fazlali, S. Cotofana","doi":"10.1109/ICCD.2010.5647761","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647761","url":null,"abstract":"Given that modulo 2<sup>n</sup>±1 are the most popular moduli in Residue Number Systems (RNS), a large variety of modulo 2<sup>n</sup>±1 adder designs have been proposed based on different number representations. However, in most of the cases, these encodings do not allow the implementation of a unified adder for all the moduli of the form 2<sup>n</sup>−1, 2<sup>n</sup>, and 2<sup>n</sup>+1. In this paper, we address the modular addition issue by introducing a new encoding, namely, the stored-unibit RNS. Moreover, we demonstrate how the proposed representation can be utilized to derive a unified design for the moduli set {2<sup>n</sup>−1,2<sup>n</sup>,2<sup>n</sup>+1}. Our approach enables a unified design for the moduli set adders, which opens the possibility to design reliable RNS processors with low hardware redundancy. Moreover, the proposed representation can be utilized in conjunction with any fast state of the art binary adder without requiring any extra hardware for end-around-carry addition.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134622137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and energy-efficient DSP systems via output probability processing","authors":"R. Abdallah, Naresh R Shanbhag","doi":"10.1109/ICCD.2010.5647569","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647569","url":null,"abstract":"This paper proposes to employ error statistics of nanoscale circuit fabrics to design robust energy-efficient digital signal processing (DSP) systems. Architectural level error statistics are exploited to generate probability or the reliability of each output bit of a DSP kernel. The proposed technique is referred to here as bit-level a posteriori probability processing (BLAPP). Energy efficiency and robustness of a 2D discrete cosine transform (2D-DCT) image codec employing BLAPP is studied. Simulations in a commercial 45nm CMOS process show that BLAPP provides up to 14X improvement in robustness, and 25% power savings over conventional 2D-DCT codec design.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114305073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High throughput, low set-up time, reconfigurable linear Feedback Shift Registers","authors":"Rick J. M. Nas, K. V. Berkel","doi":"10.1109/ICCD.2010.5647572","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647572","url":null,"abstract":"This paper presents a hardware design for a scalable, high throughput, configurable LFSR. High throughput is achieved by producing L consecutive outputs per clock cycle with a clock cycle period that, for practical cases, increases only logarithmically with the block size L and the length of the register N. Flexibility is ensured by offering full reconfigurability of the generator polynomial within 1 clock cycle. At the heart of the design is a decomposition of the block-based state-update transition-matrix into two matrices, which enables an efficient implementation in terms of both latency and area. Potential target applications for this design include PN sequence generation in CDMA systems, BIST for VLSI circuits, CRC, encryption and error correction.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126221756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying optimal generic processors for biomedical implants","authors":"C. Strydis, D. Dave","doi":"10.1109/ICCD.2010.5647642","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647642","url":null,"abstract":"The extremely limited resource budget available to medical implants makes it imperative that they are designed in the most optimal way possible. The limited resources include - but are not limited to - battery life, expected responsiveness of the system and chip area. We have already detailed the design of a design-space exploration (DSE) tool specifically geared towards finding the Pareto-optimal design front. In this paper, we choose processor configurations from the Pareto-optimal processor set found by the DSE using real implants as case studies. We find that even under the extremely biased constraints that we use, our processor(s) perform better than many of the real implants. This provides strong hints towards designing an implant processor that is generic enough to cover most, if not all, implant applications.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combined optimal and heuristic approaches for multiple constant multiplication","authors":"J. Thong, N. Nicolici","doi":"10.1109/ICCD.2010.5647750","DOIUrl":"https://doi.org/10.1109/ICCD.2010.5647750","url":null,"abstract":"We propose new optimal and heuristic approaches for solving the multiple constant multiplication (MCM) problem. Bounded depth first search (BDFS), our proposed optimal algorithm, is benchmarked on problem sizes that are impractical for the existing optimal method. We focus on MCM problems with few constants but on large bit widths. In this scenario, we outperform the existing heuristics in minimizing the number of adders. In addition, subject to a given quality of solution, our run time is faster. We reuse our heuristics for pruning within BDFS.","PeriodicalId":182350,"journal":{"name":"2010 IEEE International Conference on Computer Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121728305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}