{"title":"Pollution control caching","authors":"S. J. Walsh, J. Board","doi":"10.1109/ICCD.1995.528825","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528825","url":null,"abstract":"The bandwidth mismatch of today's high speed processors and standard DRAMS is between a factor of 10 to 50. From 1995 to the year 2000 this mismatch is expected to grow to three orders of magnitude, necessitating greater emphasis for on-chip caches. Today on-chip caches typically consume from 20% to 50% of the total chip area and their cost is mostly a function of the chip area they consume. Clearly, any technique which can maintain memory performance and reduce chip area requirements is extremely important. In this paper we present two novel cache architectures called pollution control caching (PCC) and pollution control caching plus victim buffering (PCC+VB). We have used trace driven simulation to obtain miss ratio statistics and we developed analytical models of the expected clock cycles per instruction (E[CPI]) for each architecture and cache size studied. Analytical models were parameterized with the results of our trace driven simulation. These models incorporate provisions to study the effect that on-chip cache size has on access time, and the effect that this and different main memory latencies have on the E[CPI]. Chip area models were also developed for each architecture and used as a basis for comparison. Finally, we used ANOVA techniques to better quantify the differences in the miss rate performance of the cache sizes and cache architectures studied. Our research has shown that, given the constraints of our design space, PCC+VB equipped caches can match the miss rate performance and E[CPI] of direct napped caches that require greater than five times the silicon area.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130186793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthesis for testability of large complexity controllers","authors":"F. Fummi, D. Sciuto, M. Serro","doi":"10.1109/ICCD.1995.528808","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528808","url":null,"abstract":"Specification of large complexity controllers in industrial design environments is performed by means of a top-down methodology leading to a description based on a hierarchy of FSMs. This paper presents a set of algorithms which compare such hierarchical descriptions with their structural implementations to produce irredundant circuits for which test patterns are easily derived. These algorithms can be inserted into any commercial design flow, based on VHDL descriptions, thus creating a synthesis for testability environment which provides testable and optimized gate-level descriptions.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134294742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theorem proving: not an esoteric diversion, but the unifying framework for industrial verification","authors":"D. Cyrluk, M. Srivas","doi":"10.1109/ICCD.1995.528920","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528920","url":null,"abstract":"The effectiveness of hardware verification techniques has increased markedly in the past decade. As hardware verification techniques become increasingly powerful the idea of transitioning verification technology to industry can be taken seriously. Nevertheless, powerful decision procedures that can completely automate the verification of certain types of hardware, whether they are BDD based model-checkers or automatic microprocessor verification tools, cannot be adequate on their own for industrial hardware verification. However, a high-level, general-purpose theorem prover with specific capabilities can provide an overall framework in which these tools can be embedded and in which they can then be effectively used for industrial hardware verification.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133709322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manish Pandey, Alok K. Jain, R. Bryant, D. Beatty, G. York, Samir Jain
{"title":"Extraction of finite state machines from transistor netlists by symbolic simulation","authors":"Manish Pandey, Alok K. Jain, R. Bryant, D. Beatty, G. York, Samir Jain","doi":"10.1109/ICCD.1995.528929","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528929","url":null,"abstract":"The paper describes a new technique for extracting clock level finite state machines (FSMs) from transistor netlists using symbolic simulation. The transistor netlist is preprocessed to produce a gate level representation of the netlist. Given specifications of the circuit clocking and input and output timing, simulation patterns are derived for a symbolic simulator. The result of the symbolic simulation and extraction process is the next state and output function of the equivalent FSM, represented as Ordered Binary Decision Diagrams. Compared to previous techniques, our extraction process yields an order of magnitude improvement in both space and time, is fully automated and can handle static storage structures and time multiplexed inputs and outputs.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115988233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient systolic array for the discrete cosine transform based on prime-factor decomposition","authors":"Hyesook Lim, E. Swartzlander","doi":"10.1109/ICCD.1995.528936","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528936","url":null,"abstract":"A new design of a systolic array for computing the discrete cosine transform (DCT) based on prime-factor decomposition is presented. The basic principle of the proposed systolic array is that one-dimensional (1-D) DCT can be decomposed to a 2-dimensional (2-D) DCT by input and output index mappings and the 2-D DCT is computed efficiently on a 2-D systolic array. We modify Lee's input index mapping method in order to construct one input mapping table instead of three input index mapping tables. The proposed systolic array avoids the need for the array transposer that was required by earlier implementations for the prime-factor DCT algorithms, and thus all processing can be pipelined. The proposed design of systolic array provides a simple and regular structure, which is well suited for VLSI implementation.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131085554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A coprocessor for accurate and reliable numerical computations","authors":"M. Schulte, E. Swartzlander","doi":"10.1109/ICCD.1995.528942","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528942","url":null,"abstract":"This paper presents the architecture and hardware design of a special-purpose coprocessor that performs variable-precision, interval arithmetic. Variable-precision arithmetic allows the precision of the computation to be specified, based on the problem to be solved and the required accuracy of the results. Interval arithmetic produces two values for each result, such that the true result is guaranteed to be between the two values. The coprocessor gives the programmer the ability to specify the precision of the computation, determine the accuracy of the results, and recompute inaccurate results with higher precision. Direct hardware support for variable-precision, interval arithmetic greatly improves the accuracy and reliability of numerical computations. Execution time estimates indicate that the coprocessor is two to three orders of magnitude faster than an existing software package for variable-precision, interval arithmetic.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130253608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA global routing based on a new congestion metric","authors":"Yao-Wen Chang, D. F. Wong, Chak-Kuen Wong","doi":"10.1109/ICCD.1995.528836","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528836","url":null,"abstract":"Unlike traditional ASIC routing, the feasibility of routing in FPGAs is constrained not only by the available space within a routing region, but also by the routing capacity of a switch block. Recent work has established the switch-block capacity as a superior congestion-control metric for FPGA global routing. However, the work has two deficiencies: (1) its algorithm for computing the switch-block capacity is not efficient, and (2) it, as well as the other recent works only modeled one type of routing segments-single-length lines. To remedy the deficiencies, we present in this paper efficient algorithms for obtaining the switch-block capacity and a graph modeling for routing on the new generation FPGAs with a versatile set of segment lengths. Experiments show that our algorithms dramatically reduce the run times for obtaining the switch-block capacities. Experiments with a global router based on the switch-block and channel densities for congestion control show a significant improvement in the area performance, compared with one based on the traditional congestion metric.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116162403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of a 100 MHz centralized instruction window for a superscalar microprocessor","authors":"S. Wallace, N. Dagli, N. Bagherzadeh","doi":"10.1109/ICCD.1995.528796","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528796","url":null,"abstract":"The maxim of the superscalar architecture is that higher performance can be achieved by executing multiple instructions simultaneously. This can be realized on hardware by using a centralized instruction window. We present the design and implementation of a centralized instruction window capable of out-of-order issue and completion of four instructions per cycle. A compact layout (6.4 mm by 2.2 mm) of a 32-entry instruction window resulted from a full-custom design in 1.0 /spl mu/m (drawn) 3-layer metal CMOS technology. The layout was verified by simulation and shown to operate at a clock frequency over 100 MHz.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"32 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114015378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A symbolic-simulation approach to the timing verification of interacting FSMs","authors":"A. J. Daga, W. Birmingham","doi":"10.1109/ICCD.1995.528927","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528927","url":null,"abstract":"A timing verifier that scales to verify complex sequential circuits, modeled in terms of interacting FSMs, while rejecting false sequential and combinational paths has, so far, not been developed. We present an algorithm for this purpose. The inherently modular nature of interactions among FSMs, allow a highly efficient symbolic simulation verification methodology. Experimental results illustrate this methodology's ability to scale, while providing accurate timing verification results.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131134933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anand Chavan, Shiu-Kai Chin, Shahid Ikram, J. Kim, Juin-Yeu Zu
{"title":"Extending VLSI design with higher-order logic","authors":"Anand Chavan, Shiu-Kai Chin, Shahid Ikram, J. Kim, Juin-Yeu Zu","doi":"10.1109/ICCD.1995.528795","DOIUrl":"https://doi.org/10.1109/ICCD.1995.528795","url":null,"abstract":"Extending VLSI CAD with higher-order logic integrates formal verification with synthesis. The benefits of doing so are: 1) relating instruction-set descriptions to implementations, 2) designing at a higher level of abstraction than at the level of schematics, 3) verifying by proof 4) reusing verified parameterized designs, 5) automatically compiling designs in higher-order logic to parameterized cell generators and layouts, and 6) validating electrical and functional properties by simulation. Such an integration is demonstrated by linking the Cambridge Higher-Order Logic (HOL) theorem-prover with the Mentor Graphics GDT design environment. We illustrate its applications by creating a parameterized macro-cell generator for an n-bit Am2910 microprogram sequencer whose design is formally verified with respect to its instruction-set architecture specification.","PeriodicalId":281907,"journal":{"name":"Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121848736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}