S. Moore, G. Taylor, Paul A. Cunningham, R. Mullins, Peter Robinson
{"title":"Self calibrating clocks for globally asynchronous locally synchronous systems","authors":"S. Moore, G. Taylor, Paul A. Cunningham, R. Mullins, Peter Robinson","doi":"10.1109/ICCD.2000.878271","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878271","url":null,"abstract":"We present a local clocking mechanism based on a tunable delay line which calibrates itself from a low frequency global clock. After initial tuning, the local clock remains calibrated when environmental conditions change. Each module of a large system on a chip can use one of these clock generators running at the optimal frequency for the module. Communication between locally synchronous blocks is provided by a globally asynchronous interconnect. Reliable low latency communication between the asynchronous interconnect and a local clock domain is made possible by stretching the local clock if a metastable condition could be encountered. Stretching the clock just requires the rising clock edge to be prevented from entering the tuned delay line. Similarly, a sleep state can be entered by stopping the clock and wakeup is almost instantaneous. Fine grained sleeping is possible by sleeping whenever there is no work to be undertaken and waking up as soon as new data appears over the asynchronous interconnect.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132392091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of inductive and resistive switching noise on power supply network in deep sub-micron CMOS circuits","authors":"Shiyou Zhao, K. Roy, Cheng-Kok Koh","doi":"10.1109/ICCD.2000.878270","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878270","url":null,"abstract":"In this paper, we propose an event-driven simulation based approach to estimate the worst case IR drop and Ldi/dt inductive noise an the power supply network. The switching noise is modeled as a weighted sum of the switching currents and the rates of change of the switching currents, where the weights are respectively the effective resistance and inductance (on the P/G network) experienced by each switching current. Monte Carlo and genetic algorithm are employed to search for the worst case input vector pair(s) that induce the maximum switching noise. The worst case input patterns are used in the SPICE simulation to verify the switching noise waveforms on the power supply network. Experimental results show that the worst case switching noise on the power supply network for ISCAS85 benchmark circuits implemented in TSMC 0.25 /spl mu/m technology can be as high as 40% of the supply voltage V/sub dd/.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128618998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pseudoexhaustive TPG with a provably low number of LFSR seeds","authors":"D. Kagaris, S. Tragoudas","doi":"10.1109/ICCD.2000.878267","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878267","url":null,"abstract":"Linear Feedback Shift Registers (LFSRs) are the most efficient and popular pseudo-exhaustive test pattern generation (TPG) mechanism. The goal is to minimize the required test length with low hardware overhead while obtaining pseudo-exhaustive TPG. Primitive characteristic polynomials are widely used because they require only one seed but the candidate polynomials are few and our experiments show that often the pseudoexhaustive test length is prohibitive. In this paper, we present a novel pseudoexhaustive approach with provably low number of seeds where the characteristic polynomial is the product of a primitive and an irreducible polynomial satisfying certain conditions. Our experimental results on the ISCAS'85 benchmarks show that using the proposed method requires very low hardware overhead. The list of characteristic polynomials for pseudoexhaustive TPG is greatly enhanced and our experiments show that pseudoexhaustive TPG is more feasible.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121365712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analog transient concurrent fault simulation with dynamic fault grouping","authors":"J. Hou, A. Chatterjee","doi":"10.1109/ICCD.2000.878266","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878266","url":null,"abstract":"Fast analog fault simulation is critical in test development and fault diagnosis for analog and mixed-signal circuits. It has been demonstrated that concurrent fault simulation methods can greatly reduce the computational complexity of analog fault simulation by sharing intermediate simulation results between different faults. In this paper we present an algorithm for dynamic fault grouping for transient fault simulation of nonlinear analog circuits. The goal of fault grouping in general is to minimize the total fault simulation running time for all faulty circuits while satisfying the simulation accuracy constraints. Fault grouping allows subset of faults with similar transient response characteristics to be simulated concurrently for a given test stimulus. Time step increments for each fault group are adaptively selected to limit simulation error while maximizing simulation concurrency. Results of simulation performance and statistics on test circuits are presented.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123609739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masaaki Kondo, H. Okawara, Hiroshi Nakamura, T. Boku
{"title":"SCIMA: Software controlled integrated memory architecture for high performance computing","authors":"Masaaki Kondo, H. Okawara, Hiroshi Nakamura, T. Boku","doi":"10.1109/ICCD.2000.878275","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878275","url":null,"abstract":"Processor performance has been improved due to clock acceleration and ILP extraction techniques. Performance of main memory, however, has not been improved so much. The performance gap between processor and memory will be growing further in the future. This is very serious problem in high performance computing because effective performance is limited by memory ability in most cases. In order to overcome this problem, we propose a new VLSI architecture called SCIMA which integrates software controllable memory into a processor chip. Most of data access is regular in high performance computing. The software controllable memory is more suitable for making good use of the regularity than conventional cache. This paper presents its architecture and performance evaluation. The evaluation results reveal the superiority of SCIMA compared with conventional cache-based architecture.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124267011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Oskin, D. Franklin, J. Hensley, L. Lita, F. Chong
{"title":"Reducing cost and tolerating defects in page-based intelligent memory","authors":"M. Oskin, D. Franklin, J. Hensley, L. Lita, F. Chong","doi":"10.1109/ICCD.2000.878297","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878297","url":null,"abstract":"Active Pages is a page-based model of intelligent memory specifically designed to support virtualized hardware resources. Previous work has shown substantial performance benefits from off loading data-intensive tasks to a memory system that implements Active Pages. With a simple VLIW processor embedded near each page on DRAM, Active Page memory systems achieve up to 1000X speedups over conventional memory systems. In this study, we examine Active Page memories that share, or multiplex, embedded VLIW processors across multiple physical Active Pages. We explore the trade-off between individual page-processor performance and page-level multiplexing. We find that hardware costs of computational logic can be reduced from 31% of DRAM chip area to 12%, through multiplexing, without significant loss in performance. Furthermore, manufacturing defects that disable up to 50% of the page processors can be tolerated through efficient resource allocation and associative multiplexing.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126406463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient place and route for pipeline reconfigurable architectures","authors":"S. Cadambi, S. Goldstein","doi":"10.1109/ICCD.2000.878318","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878318","url":null,"abstract":"In this paper, we present a fast and efficient compilation methodology for pipeline reconfigurable architectures. Our compiler back-end is much faster than conventional CAD tools, and fairly efficient. We represent pipeline reconfigurable architectures by a generalized VLIW-like model. The complex architectural constraints are effectively expressed in terms of a single graph parameter: the routing path length (RPL). Compiling to our model using RPL, we demonstrate fast compilation times and show speedups of between 10x and 200x on a pipeline reconfigurable architecture when compared to an UltraSparc-II.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125002559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of channeled DRAM memory architectures","authors":"L. Friebe, Y. Yabe, M. Motomura","doi":"10.1109/ICCD.2000.878295","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878295","url":null,"abstract":"Channeled DRAM features small on-chip buffers called channels that are placed in front of the DRAM core. In this study various techniques to efficiently control the channels were investigated. Different techniques of caching and prefetching were adapted to the unique features of Channeled DRAM. An existing execution-driven processor simulator was extended by a memory simulation library and three benchmarks were run on four different memory system configurations of this simulator to evaluate the performance of the different control strategies. As a result, using Channeled DRAM as replacement for conventional SDRAM improves the memory system performance by reducing the average access latency up to 50%.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129376015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-performance low-power CMOS circuits using multiple channel length and multiple oxide thickness","authors":"N. Sirisantana, Liqiong Wei, K. Roy","doi":"10.1109/ICCD.2000.878290","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878290","url":null,"abstract":"Power optimization has become an important issue for high performance designs. One way to achieve low-power and high performance circuits is to use dual-threshold voltages. High threshold transistors can be used in non-critical paths to reduce the leakage power, while lower threshold voltage is used for transistors in critical path(s) to achieve high performance. This paper proposes two low power and high performance CMOS design techniques-multiple channel length (M/sub L/CMOS) and multiple oxide thickness (M/sub ox/CMOS), based on dual V/sub th/, design technique. A comprehensive algorithm for selecting and assigning optimal transistor threshold voltage, channel length and oxide thickness is given. The simulation results on ISCAS benchmark circuits show that the total power consumption can be reduced by 21% for M/sub L/CMOS at low activity. Total power savings for M/sub ox/CMOS at low and high switching activities are about 42% and 24%, respectively.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125804578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Solomatnikov, D. Somasekhar, K. Roy, Chena-Kok Koh
{"title":"Skewed CMOS: Noise-immune high-performance low-power static circuit family","authors":"A. Solomatnikov, D. Somasekhar, K. Roy, Chena-Kok Koh","doi":"10.1109/ICCD.2000.878292","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878292","url":null,"abstract":"In this paper, we present a noise-immune high-performance static circuit family suitable for low-voltage operation called skewed logic. Skewed logic circuits, in comparison with Domino logic, have better scalability, and they are more suitable for low voltage applications because of better noise margin. Skewed logic has been compared with Domino logic in terms of delay, power, and dynamic noise immunity. A design methodology for skewed CMOS pipelined circuits has been developed. Comparisons between skewed and Domino circuits on a 0.25 /spl mu/m 700 MHz 16/spl times/16 bits pipelined multiplier show superior properties of skewed circuits over Domino in terms of clock power dissipation and peak current consumption.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116435794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}