{"title":"Design of delay-insensitive three dimension pipeline array multiplier for image processing","authors":"A. Taubin, K. Fant, J. McCardle","doi":"10.1109/ICCD.2002.1106755","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106755","url":null,"abstract":"This paper presents a novel delay-insensitive three dimension pipeline array multiplier. The organization combines deep (gate-level) pipelining of Manchester adders with a two dimensional cross-pipeline mesh for multiplicand and multiplier propagation and partial product bits calculation. Fine grain pipelining with elimination of broadcasting and completion trees leads to high-throughput without use of dynamic logic that leaves the door open for further improvement of performance.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132128466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Floating-point fused multiply-add with reduced latency","authors":"T. Lang, J. Bruguera","doi":"10.1109/ICCD.2002.1106762","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106762","url":null,"abstract":"We propose an architecture for the computation of the floating-point multiply-add-fused (MAF) operation A+ (B /spl times/ C). This architecture is based on the combined addition and rounding (using a dual adder) and on the anticipation of the normalization step before the addition. Because the normalization is performed before the addition, it is not possible to overlap the leading-zero-anticipator with the adder. Consequently, to avoid the increase in delay we modify the design of the LZA so that the leading bits of its output are produced first and can be used to begin the normalization. Moreover, parts of the addition are also anticipated. We have estimated the delay of the resulting architecture for double-precision format, considering the load introduced by long connections, and estimate a reduction of about 15% to 20% with respect to traditional implementations of the floating-point MAF unit.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126389632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive pipeline depth control for processor power-management","authors":"A. Efthymiou, J. Garside","doi":"10.1109/ICCD.2002.1106812","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106812","url":null,"abstract":"A method of managing the power consumption of an embedded, single-issue processor by controlling its pipeline depth is proposed. The execution time will be increased but, if the method is applied to applications with slack time, the user-perceived performance may not be degraded Two techniques are shown using an existing asynchronous processor as a starting point. The first method controls the pipeline occupancy using a token mechanism, the second enables adjacent pipeline stages to be merged, by making the latches between them 'permanently' transparent. An energy reduction of up to 16% is measured, using a collection of five benchmarks.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121334725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Don't-care identification on specific bits of test patterns","authors":"K. Miyase, S. Kajihara, I. Pomeranz, S. Reddy","doi":"10.1109/ICCD.2002.1106769","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106769","url":null,"abstract":"Given a test set for stuck-at faults, a primary input value may be changed to the opposite logic value without losing fault coverage. One can regard such a value as a don't-care (X). The don't care values can be filled appropriately to achieve test compaction, test data compression, or power reduction during testing. However, these uses are better served if the don't cares can be placed in desired/specific bit positions of the test patterns. In this paper, we present a method for maximally fixing Xs on specific bits of given test vectors. Experimental results on ISCAS benchmark circuits show how the proposed method can increase the number of Xs on specific bits compared with an earlier proposed method.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116333864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A distributed computation platform for wireless embedded sensing","authors":"A. Savvides, M. Srivastava","doi":"10.1109/ICCD.2002.1106774","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106774","url":null,"abstract":"We present a low cost wireless microsensor node architecture for distributed computation and sensing in massively distributed embedded systems. Our design focuses on the development of a versatile, low power device to facilitate experimentation and initial deployment of wireless microsensor nodes in deeply embedded systems. This paper provides the details of our architecture and introduces fine-grained node localization as an example application of distributed computation and wireless embedded sensing.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131302531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient PEEC-based inductance extraction using circuit-aware techniques","authors":"Haitian Hu, S. Sapatnekar","doi":"10.1109/ICCD.2002.1106808","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106808","url":null,"abstract":"Practical approaches for on-chip inductance extraction to obtain a sparse, stable and accurate inverse inductance matrix K are proposed. The novelty of our work is in using circuit characteristics to define the concept of resistance-dominant and inductance-dominant lines. This notion is used to progressively refine a set of clusters that are inductively tightly-coupled. For reasonable designs, the more exact algorithm yields a sparsification of 97% for delay and oscillation magnitude errors of 10% and 15%, respectively, while the more approximate algorithm achieves up to 99% sparsification. An offshoot of this work is K-PRIMA, an extension of PRIMA to handle K matrices with guaranteed passivity.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130960857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhigang Hu, Philo Juang, K. Skadron, D. Clark, M. Martonosi
{"title":"Applying decay strategies to branch predictors for leakage energy savings","authors":"Zhigang Hu, Philo Juang, K. Skadron, D. Clark, M. Martonosi","doi":"10.1109/ICCD.2002.1106809","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106809","url":null,"abstract":"With technology advancing toward deep submicron, leakage energy is of increasing concern, especially for large onchip array structures such as caches and branch predictors. Recent work has suggested that even larger branch predictors can and should be used in order to improve microprocessor performance. A further consideration is that the branch predictor is a thermal hot spot, thus further increasing its leakage. For these reasons, it is natural to consider applying decay techniques-already shown to reduce leakage energy for caches-to branch-prediction structures. Due to the structural difference between caches and branch predictors, applying decay techniques to branch predictors is not straightforward. This paper explores the strategies for exploiting spatial and temporal locality to make decay effective for bimodal, gshare, and hybrid predictors, as well as the branch target buffer Overall, this paper demonstrates that decay techniques apply more broadly than just to caches, but that careful policy and implementation make the difference between success and failure in building decay-based branch predictors. Multi-component hybrid predictors offer especially interesting implementation tradeoffs for decay.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128518468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedded operating system energy analysis and macro-modeling","authors":"T. K. Tan, A. Raghunathan, N. Jha","doi":"10.1109/ICCD.2002.1106822","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106822","url":null,"abstract":"A large and increasing number of modern embedded systems are subject to tight power/energy constraints. It has been demonstrated that the operating system (OS) can have a significant impact on the energy efficiency of the embedded system. Hence, analysis of the energy effects of the OS is of great importance. Conventional approaches to energy analysis of the OS (and embedded software, in general) require the application software to be completely developed and integrated with the system software, and that either measurement on a hardware prototype or detailed simulation of the entire system be performed. Since this process requires significant design effort, unfortunately, it is typically too late in the design cycle to perform high-level or architectural optimizations on the embedded software, restricting the scope of power savings. Our work recognizes the need to provide embedded software designers with feedback about the effect of different OS services on energy consumption early in the design cycle. As a first step in that direction, this paper presents a systematic methodology to perform energy analysis and macro-modeling of an embedded OS. Our energy macro-models provide software architects and developers with an intuitive model for the OS energy effects, since they directly associate energy consumption with OS services and primitives that are visible to the application software. Our methodology consists of (i) an analysis stage, where we identify a set of energy components, called energy characteristics, which are useful to the designer in making OS-related design trade-offs, and (ii) a subsequent macromodeling stage, where we collect data for the identified energy components and automatically derive macro-models for them. We validate our methodology by deriving energy macro-models for two state-of-the-art embedded OS's, /spl mu/C/OS and Linux OS.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130402375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From ASIC to ASIP: the next design discontinuity","authors":"K. Keutzer, S. Malik, A. Newton","doi":"10.1109/ICCD.2002.1106752","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106752","url":null,"abstract":"A variety of factors is making it increasingly difficult and expensive to design and manufacture traditional Application Specific Integrated Circuits (ASICs). This has started a significant move towards the use of programmable solutions of various forms - increasingly referred to as programmable platforms. For the platform manufacturer, programmability provides higher volume to amortize design and manufacturing costs, as the same platform can be used over multiple related applications, as well as over generations of an application. For the application implementer, programmability provides a lower risk and shorter time-to-market implementation path. The flexibility provided by programmability comes with a performance and power overhead. This can be significantly mitigated by using application specific platforms, also referred to as Application Specific Instruction Set Processors (ASIPs). This paper details the reasons for this significant change in application implementation philosophy, provides illustrative contemporary evidence of this change, examines the space of application specific platforms, outlines fundamental problems in their development, and finally presents a methodology to deal with this changing design style.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133818350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerated SAT-based scheduling of control/data flow graphs","authors":"S. Memik, F. Fallah","doi":"10.1109/ICCD.2002.1106801","DOIUrl":"https://doi.org/10.1109/ICCD.2002.1106801","url":null,"abstract":"In this paper we present a satisfiability-based approach to the scheduling problem in high-level synthesis. We formulate the resource constrained scheduling as a satisfiability (SAT) problem. We present experimental results on the performance of the state-of-the-art SAT solver Chaff, and demonstrate techniques to reduce the SAT problem size by applying bounding techniques on the scheduling problem. In addition, we demonstrate the use of transformations on control data flow graphs such that the same lower bound techniques can operate on them as well. Our experiments show that Chaff is able to outperform the integer linear program (ILP) solver CPLEX in terms of CPU time by as much as 59 fold. Finally, we conclude that the satisfiability-based approach is a promising alternative for obtaining optimal solutions to NP-complete scheduling problem instances.","PeriodicalId":164768,"journal":{"name":"Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors","volume":"94 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116821134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}