{"title":"Multilevel integral equation methods for the extraction of substrate coupling parameters in mixed-signal IC's","authors":"M. Chou, Jacob K. White","doi":"10.1145/277044.277049","DOIUrl":"https://doi.org/10.1145/277044.277049","url":null,"abstract":"The extraction of substrate coupling resistances can be formulated as a first-kind integral equation, which requires only discretization of the two-dimensional contacts. However, the result is a dense matrix problem which is too expensive to store or to factor directly. Instead, we present a novel, multigrid iterative method which converges more rapidly than previously applied Krylov-subspace methods. At each level in the multigrid hierarchy, we avoid dense matrix-vector multiplication by using moment-matching approximations and a sparsification algorithm based on eigendecomposition. Results on realistic examples demonstrate that the combined approach is up to an order of magnitude faster than a Krylov-subspace method with sparsification, and orders of magnitude faster than not using sparsification at all.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129853833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Marques, M. Kamon, Jacob K. White, L. M. Silveira
{"title":"A mixed nodal-mesh formulation for efficient extraction and passive reduced-order modeling of 3D interconnects","authors":"N. Marques, M. Kamon, Jacob K. White, L. M. Silveira","doi":"10.1145/277044.277132","DOIUrl":"https://doi.org/10.1145/277044.277132","url":null,"abstract":"As VLSI circuit speeds have increased, reliable chip and system design can no longer be performed without accurate three-dimensional interconnect models. In this paper, we describe an integral equation approach to modeling the impedance of interconnect structures accounting for both the charge accumulation on the surface of conductors and the current traveling in their interior. Our formulation, based on a combination of nodal and mesh analysis, has the required properties to be combined with Model Order Reduction techniques to generate accurate and guaranteed passive low order interconnect models for efficient inclusion in standard circuit simulators. Furthermore, the formulation is shown to be more flexible and efficient than previously reported methods.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124424293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite state machine decomposition for low power","authors":"J. Monteiro, Arlindo L. Oliveira","doi":"10.1145/277044.277235","DOIUrl":"https://doi.org/10.1145/277044.277235","url":null,"abstract":"Clock-gating techniques have been shown to be very effective in the reduction of the switching activity in sequential logic circuits. The authors describe a new clock-gating technique based on finite state machine (FSM) decomposition. They compute two sub-FSMs that together have the same functionality as the original FSM. For all the transitions within one sub-FSM, the clock for the other sub-FSM is disabled. To minimize the average switching activity, they search for a small cluster of states with high stationary state probability and use it to create the small sub-FSM. This way one will have a small amount of logic that is active most of the time, during which is disabling a much larger circuit, the other sub-FSM. They provide a set of experimental results that show that power consumption can be substantially reduced, in some cases up to 80%.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124509100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FACT: a framework for the application of throughput and power optimizing transformations to control-flow intensive behavioral descriptions","authors":"G. Lakshminarayana, N. Jha","doi":"10.1109/DAC.1998.724448","DOIUrl":"https://doi.org/10.1109/DAC.1998.724448","url":null,"abstract":"In this paper, we present an algorithm for the application of a general class of transformations to control-flow intensive behavioral descriptions. Our algorithm is based on the observation that incorporation of scheduling information can help guide the selection and application of candidate transformations, and significantly enhance the quality of the synthesized solution. The efficacy of the selected throughput and power optimizing transformations is enhanced by the ability of our algorithm to transcend basic blocks in the behavioral description. This ability is imparted to our algorithm by a general technique we have devised. Our system currently supports associativity, commutativity, distributivity, constant propagation, code motion, and loop unrolling. It is integrated with a scheduler which performs implicit loop unrolling and functional pipelining, and has the ability to parallelize the execution of independent iterative constructs whose bodies can share resources. Other transformations can easily be incorporated within the framework. We demonstrate the efficacy of our algorithm by applying it to several commonly available benchmarks. Upon synthesis, behaviors transformed by the application of our algorithm showed up to 6-fold improvement in throughput over an existing transformation algorithm, and up to 4.5-fold improvement in power over designs produced without the benefit of our algorithm.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122971111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A practical repeater insertion method in high speed VLSI circuits","authors":"Julian Culetu, C. Amir, J. MacDonald","doi":"10.1145/277044.277151","DOIUrl":"https://doi.org/10.1145/277044.277151","url":null,"abstract":"In today's design of VLSI high speed circuits, frequency has a major impact on the number of repeaters that needs to be inserted. A microprocessor operating at less than 200 Mhz might require several hundred repeaters, while one operating at greater than 500 Mhz may require a number in the thousands. The following paper describes an efficient and simple way to automatically determine buffer placement based on maintaining equal transition time for all gate input signals across the net. A maximum allowable transition time is determined (limited by the frequency of the circuit), and correlated with the interconnect Elmore Delay. A Spice RC model having nodes with physical locations (X, Y coordinates) can be obtained by extraction tools providing standard parasitic format (SPF). This can then be used with the results of the algorithm for repeater placement to determine the exact physical location desired for each repeater.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128352039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Digital system simulation: methodologies and examples","authors":"K. Olukotun, M. Heinrich, D. Ofelt","doi":"10.1145/277044.277212","DOIUrl":"https://doi.org/10.1145/277044.277212","url":null,"abstract":"Simulation serves many purposes during the design cycle of a digital system. In the early stages of design, high-level simulation is used for performance prediction and analysis. In the middle of the design cycle, simulation is used to develop the software algorithms and refine the hardware. In the later stages of design, simulation is used make sure performance targets are reached and to verify the correctness of the hardware and software. The different simulation objectives require varying levels of modeling detail. To keep design time to a minimum, it is critical to structure the simulation environment to make it possible to trade-off simulation performance for model detail in a flexible manner that allows concurrent hardware and software development. In this paper we describe the different simulation methodologies for developing complex digital systems, and give examples of one such simulation environment.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123583703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Code compression for embedded systems","authors":"H. Lekatsas, W. Wolf","doi":"10.1145/277044.277185","DOIUrl":"https://doi.org/10.1145/277044.277185","url":null,"abstract":"Memory is one of the most restricted resources in many modern embedded systems. Code compression can provide substantial savings in terms of size. In a compressed code CPU, a cache miss triggers the decompression of a main memory block, before it gets transferred to the cache. Because the code must be decompressible starting from any point (or at least at cache block boundaries), most file-oriented compression techniques cannot be used. We propose two algorithms to compress code in a space-efficient and simple to decompress way, one which is independent of the instruction set and another which depends on the instruction set. We perform experiments on true instruction sets, a typical RISC (MIPS) and a typical CISC (x86) and compare our results to existing file-oriented compression algorithms.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122938206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using complementation and resequencing to minimize transitions","authors":"R. Murgai, M. Fujita, Arlindo L. Oliveira","doi":"10.1145/277044.277219","DOIUrl":"https://doi.org/10.1145/277044.277219","url":null,"abstract":"In (Murgai et al., 1997) the following problem was addressed: given a set of data words or messages to be transmitted over a bus such that the sequence (order) in which they are transmitted is irrelevant, determine the optimum sequence that minimizes the total number of transitions on the bus. Stan and Burleson (1994) presented the bus-invert method as a means of encoding words for reducing I/O power, in which a word may be inverted and then transmitted if doing so reduces the number of transitions. In this paper, we combine the two paradigms into one-that of sequencing words under the bus-invert scheme for the minimum transitions, i.e., words can be complemented, reordered and then transmitted. We prove that this problem DOPI-Data Ordering Problem with Inversion-is NP-complete. We present a polynomial-time approximation algorithm to solve DOPI that comes within a factor of 1.5 from the optimum. Experimental results show that, on average, the solutions generated by our algorithm were within 4.4% of the optimum, and that resequencing along with complementation leads to 34.4% reduction in switching activity.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"39 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128161390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-pad power/ground network design for uniform distribution of ground bounce","authors":"Jaewon Oh, Massoud Pedram","doi":"10.1109/DAC.1998.724484","DOIUrl":"https://doi.org/10.1109/DAC.1998.724484","url":null,"abstract":"This paper presents a method for power and ground (p/g) network routing for high speed CMOS chips with multiple p/g pads. Our objective is not to reduce the total amount of the ground bounce, but to distribute it more evenly among the pads while the routing area is kept to a minimum. We first show that proper p/g terminal to pad assignment is necessary to reduce the maximum ground bounce and then present a heuristic for performing simultaneous assignment and p/g net routing. Experimental results demonstrate the effectiveness of our method.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114619179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal FPGA mapping and retiming with efficient initial state computation","authors":"J. Cong, Chang Wu","doi":"10.1109/DAC.1998.724492","DOIUrl":"https://doi.org/10.1109/DAC.1998.724492","url":null,"abstract":"For sequential circuits with given initial states, new equivalent initial states must be computed for retiming, which unfortunately is NP-hard. In this paper we propose a novel polynomial time algorithm for optimal FPGA mapping with forward retiming to minimize the clock period with guaranteed initial state computation. It enables a new methodology of separating forward retiming from backward retiming to avoid time-consuming iterations between retiming and initial state computation. Our algorithm compares very favorably with both of the conventional approaches of separate mapping followed by retiming and the recent approaches of combined mapping with retiming. It is also applicable to circuits with partial initial state assignment.","PeriodicalId":221221,"journal":{"name":"Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175)","volume":"355 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121630208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}