Minoru Inamori, K. Ishii, A. Tsutsui, K. Shirakawa, H. Nakada, T. Miyazaki
{"title":"A new processor architecture for digital signal transport systems","authors":"Minoru Inamori, K. Ishii, A. Tsutsui, K. Shirakawa, H. Nakada, T. Miyazaki","doi":"10.1109/ICCD.1997.628863","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628863","url":null,"abstract":"This paper proposes a new processor architecture for manipulating protocols in digital signal transport systems. To realize flexible and high-performance digital signal transport systems, the architecture has unique application-specific hardware with a core CPU. It is derived from an analysis of functions in real systems. A computer simulation confirms the efficiency of the architecture.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115365438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An integrated placement and synthesis approach for timing closure of PowerPC/sup TM/ microprocessors","authors":"S. Hojat, P. Villarrubia","doi":"10.1109/ICCD.1997.628869","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628869","url":null,"abstract":"This paper describes an approach for tight integration between a synthesis and a placement tool. The purpose of this integration is to improve timing convergence of advanced microprocessors. It is shown that this approach results in \"legal\" placements with, in general, lower delay, and design size. More significantly, the number of iterations to reach a timing closure is reduced drastically. The wire length estimates that are being used to traditionally drive the timing optimization in synthesis are inadequate. Instead, the integrated approach leads to enhanced results as well as faster timing convergence. The impact of various parameters in synthesis and placement on the final results is shown.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123749416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Equivalence checking using abstract BDDs","authors":"S. Jha, Yuan Lu, M. Minea, E. Clarke","doi":"10.1109/ICCD.1997.628891","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628891","url":null,"abstract":"We introduce a new equivalence checking method based on abstract BDDs (aBDDs). The basic idea is the following: given an abstraction function, aBDDs reduce the size of BDDs by merging nodes that have the same abstract value. An aBDD has bounded size and can be constructed without constructing the original BDD. We show that this method of equivalence checking is always sound. It is complete for an important class of arithmetic circuits that includes integer multiplication. We also suggest heuristics for findings suitable abstraction functions based on the structure of the circuit. The efficiency of this technique is illustrated by experiments on ISCAS'85 benchmark circuits.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125351812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronous transpose-matrix architectures","authors":"J. Tierno, P. Kudva","doi":"10.1109/ICCD.1997.628904","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628904","url":null,"abstract":"The matrix transposition operation is a necessary step in several image/video compression and decompression algorithms, in particular the discrete cosine transform (DCT) and its inverse (IDCT), and some distributed arithmetic applications. These algorithms have to be performed at high data-rates, and with a minimum of power dissipation for portable applications. The authors describe how the clocked solution is usually implemented, and present two new asynchronous architectures that perform matrix transposition. These architectures, one based on two phase signaling, one based on four phase signaling, have better characteristics than the clocked solution in terms of latency and power, at no cost in area or throughput. They discuss the characteristics of these three architectures and evaluate the relative advantages of each one.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125438695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast low-energy VLSI binary addition","authors":"K. Parhi","doi":"10.1109/ICCD.1997.628938","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628938","url":null,"abstract":"This paper presents novel architectures for fast binary addition which can be implemented using multiplexers only. Binary addition as carried out using a fast redundant-to-binary converter. It is shown that appropriate encoding of the redundant digits and recasting the binary addition as a redundant-to-binary conversion reduces the latency of addition from Wt/sub fa/, to Wt/sub mux/ where t/sub fa/ and t/sub mux/ respectively, represent binary full adder and multiplexer delays, and W is the word-length. A family of fast converter architectures is developed based on tree-type (obtained using lookahead techniques) and carry-select approaches. The carry-generation component is the critical component in redundant-to-binary conversion and binary addition. It is shown that fastest binary addition can be performed using (Wlog/sub 2/+W+1) multiplexers in time (log/sub 2/W+2)t/sub mux/. If the specified adder latency is greater than (log/sub 2/W+2)t/sub mux/, then a family of converters using fewest multiplexers can be designed based on carry-select approach. Finally a class of hybrid adders are designed by using a carry-select configuration and by substituting tree-based blocks in place of some carry-select blocks. It is shown that this approach can lead to adder designs which consume the least energy.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114403028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An approach to network caching for multimedia objects","authors":"M. Kozuch, W. Wolf, A. Wolfe","doi":"10.1109/ICCD.1997.628879","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628879","url":null,"abstract":"Caching is an important mechanism for improving both the performance and operational cost of multimedia networks. This paper presents a new approach to network caching for large multimedia objects, the Caching Tree Algorithm, which is based on previous research regarding the File Allocation Problem. This novel algorithm is both distributed and computationally tractable. We also describe some practical considerations in the application of the algorithm to modern networks.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117007801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Properties of the input pattern fault model","authors":"R. D. Blanton, John P. Hayes","doi":"10.1109/ICCD.1997.628897","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628897","url":null,"abstract":"Recent work in IC failure analysis strongly indicates the need for fault models that directly analyze the function of circuit primitives. The input pattern (IP) fault model is a functional fault model that allows for both complete and partial functional verification of every circuit module, independent of the design level. We describe the IP fault model and provide a method for analyzing IP faults using standard SSL-based fault simulators and test generation tools. The method is used to generate test sets that target the IP faults of the ISCAS85 benchmark circuits and a carry-lookahead adder. Improved IP fault coverage for the benchmarks and the adder is obtained by adding a small number of test patterns to tests that target only SSL faults. We also conducted fault simulation experiments that show IP test patterns are effective in detecting non-targeted faults such as bridging and transistor stuck-on faults. Finally, we discuss the notion of IP redundancy and show how large amounts of this redundancy exist in the benchmarks and in SSL-irredundant adder circuits.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114878624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-level design synthesis of a low power, VLIW processor for the IS-54 VSELP speech encoder","authors":"R. Henning, C. Chakrabarti","doi":"10.1109/ICCD.1997.628923","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628923","url":null,"abstract":"General purpose DSPs typically used to implement speech coders in digital cellular phones do not allow enough exploitation of the speech coding algorithm itself for power reduction. In this paper, high-level design synthesis of a low power, VLIW (very long instruction word) processor dedicated to implementing the IS-54 VSELP speech encoding algorithm is presented. Significant power reduction is achieved through algorithm dependent techniques, including application specific hardware design, supply voltage reduction through highly parallel execution, and exploitation of data correlation inherent to the algorithm. Preliminary estimates indicate that the design could result in a 5.35 mm/sup 2/ processor that executes in real-time with an average power dissipation of about 28 mW.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126204928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Chang, V. Kanevsky, O. S. Nakagawa, K. Rahmat, Soo-Young Oh
{"title":"Fast generation of statistically-based worst-case modeling of on-chip interconnect","authors":"N. Chang, V. Kanevsky, O. S. Nakagawa, K. Rahmat, Soo-Young Oh","doi":"10.1109/ICCD.1997.628944","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628944","url":null,"abstract":"In this paper, we describe a novel methodology for obtaining statistically-based worst case (i.e. 3-/spl sigma/) R (resistance), C (capacitance), and delay given variations in interconnect-related process parameters. Our approach is based on a weighted root-sum square method to derive 3-/spl sigma/ C. A Monte Carlo-based method is used for the generation of 3-/spl sigma/ R as well as randomized distributed RC nets to obtain realistic 3-/spl sigma/ delays for long interconnect nets such as global critical paths. Using this methodology for a long critical net analysis on a 0.35 /spl mu/m process, a more than 70% improvement in 3-/spl sigma/ delay estimation compared with the traditional skew-corner worst case delay can be realized.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121951294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronous wrapper for heterogeneous systems","authors":"D. Bormann, P. Cheung","doi":"10.1109/ICCD.1997.628884","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628884","url":null,"abstract":"We propose a new method for creating globally asynchronous locally synchronous (GALS) circuits. Each locally synchronous module is surrounded by an \"asynchronous wrapper\" which provides an asynchronous interface to an otherwise synchronous circuit. Every locally synchronous (LS) region operates independently, minimising problems of clock skew and enabling regions to run at different clock speeds if desired. Metastability can never cause the system to fail because an asynchronous handshake \"stretches\" or \"pauses\" the local clock until data has stabilised. When new data is not available for processing, the local clock stretches, automatically preventing the LS block from consuming power. Once new data does arrive, the block responds directly in phase with the handshake without wasted synchronisation time. The LS modules can be designed using typical synchronous techniques. However, since the external interface to each LS block uses asynchronous handshaking, we can now freely mix synchronous and asynchronous circuits.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131801664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}