{"title":"High performance CMOS circuit techniques for the G-4 S/390 microprocessor","authors":"J. Warnock, L. Sigal, B. Curran, Y. Chan","doi":"10.1109/ICCD.1997.628875","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628875","url":null,"abstract":"This paper describes the CMOS circuit techniques used in the design of the high performance Generation-4 S/390 microprocessor. Successful system operation at frequencies up to 400 MHz was achieved through careful static circuit design and timing optimization, along with the limited use of dynamic circuits for highly critical functions, and several different clocking/latching strategies for cycle time reduction. A variety of innovative full-custom circuit techniques were used in the dataflow design. Timing-driven synthesis of the control logic provided maximum flexibility with minimum turn-around time, while still matching the performance level set by the custom parts of the design. The on-chip LI cache was designed extensively with self-resetting CMOS (SRCMOS) circuitry to provide a 2.0 ns access time and up to 500 MHz operation.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124254458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Is wireless data dead?","authors":"R. Katz","doi":"10.1109/ICCD.1997.628927","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628927","url":null,"abstract":"Summary form only given. In this presentation, we explore in greater detail the challenges faced by wireless data services, and some of the technology developments and possible solutions that lead us to be optimistic that wireless data is not yet dead, and in fact, has a very promising future. In particular, new spectrum allocations, coupled with integrated circuit technology breakthroughs, will enable much higher data rates. For example, the FCC has recently allocated spectrum for the Unlicensed NII Band at 5.15 GHz (350 MHz) and in the 60 GHz band (an incredible 5 GHz of available spectrum). Furthermore, ubiquitous digital cellular telephones will provide a widely available, flexible, and moderate rate digital channel for voice and data over the wide-area. And new networking technologies, in particular, wireless overlay networks and spectrum sharing techniques, will make it possible to maintain connectivity as a user moves from room-sized wireless networks, to building-sized networks, to the metropolitan, wide-area, and regional networks.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115980162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 400 MHz, 144 Kb CMOS ROM macro for an IBM S/390-class microprocessor","authors":"A. Tuminaro","doi":"10.1109/ICCD.1997.628876","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628876","url":null,"abstract":"A high performance 2 K/spl times/72 CMOS ROM for fetching most frequently used complex instruction code in a high speed S/390-class microprocessor is described in this paper. The ROM has a nominal access/cycle time performance of 2.3 ns/2.5 ns and is physically organized as 128 word lines by 1152 bit lines. Personalization is done at the gate level of the device. The technology used was the IBM CMOS6S technology which features Leff=0.2 /spl mu/m and a 2.5 V power supply. Several innovative circuit techniques were employed to achieve the aggressive ROM access/cycle time performance. Each stage in the access path is dynamically reset thereby avoiding the use of a centralized clock circuit and also yielding the benefit of a fast cycle time. The ROM macro features a dynamic reference source and sense amplifier which allows single ended sensing of a bit line. Also the sense amplifier clock is generate from the decoded word line through an OR tree. Hence the access time performance tracks with the loading on the decoded word line. The macro physical area is 3300/spl times/715 /spl mu/m/sup 2/ and the array cell has an area of 2/spl times/2 /spl mu/m/sup 2/. Less than 10% of the ROM macro area is designated to ABIST circuitry which allows for extensive test coverage.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116013594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Critical voltage transition logic: an ultrafast CMOS logic family","authors":"Zhang‐ming Zhu, B. Carlson","doi":"10.1109/ICCD.1997.628946","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628946","url":null,"abstract":"The authors present a new kind of CMOS logic circuit that has a different structure and different operation mechanism compared to the existing logic circuits. Its unique delay propagation characteristic makes it much faster than the conventional CMOS logic gate. Gate outputs are preconditioned to a voltage level between V/sub dd/ and V/sub ss/ using a new clocking scheme and circuit design. They give a buffer design example which is about 6.5 times faster than the conventional buffer. The total energy consumed by the new circuit structure is slightly more than conventional CMOS domino logic; however the energy-delay product is smaller.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128446391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory traffic and data cache behavior of an MPEG-2 software decoder","authors":"Peter Soderquist, M. Leeser","doi":"10.1109/ICCD.1997.628903","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628903","url":null,"abstract":"The authors investigate the impact of multimedia applications on the cache behavior of desktop systems. Specifically they consider the memory bandwidth and data cache challenges associated with MPEG-2 software decoding. Recent extensions to instruction set architectures, including Intel's MMX, address the computational aspects of MPEG decoding. The large amount of data traffic generated, however has received little attention. Standard data caches consistently generate an excess of cache-memory traffic. Varying basic cache parameters only reduces traffic to double the minimum required at best. Incremental changes in cache size have a negligible effect for most feasible values. Increasing set associativity yields rapidly diminishing returns, and manipulating line size is similarly unproductive. Achieving higher efficiency requires understanding the composition and behavior of the decoder data set. They present a model of MPEG-2 decoder memory behavior and describe how to exploit this knowledge to minimize required memory bandwidth. Their results show that simply eliminating one component, video output data, from the cache can reduce traffic by as much as 50 percent.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"84 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120824826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Timed binary decision diagrams","authors":"Zhongcheng Li, Yuhong Zhao, Y. Min, R. Brayton","doi":"10.1109/ICCD.1997.628894","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628894","url":null,"abstract":"The paper presents an extension to OBDDs with timing information, called timed binary decision diagrams (TBDDs). TBDDs are also canonical and allow the symbolic manipulation of Boolean functions with timing information. A TBDD software package is implemented based on the existing CMU BDD package. Experimental results demonstrate the efficiency of the TBDDs in representing circuits with both functional and timing information.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121737076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A repeater optimization methodology for deep sub-micron, high-performance processors","authors":"David Li, Andrew Pua, Pranjal Srivastava, U. Ko","doi":"10.1109/ICCD.1997.628945","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628945","url":null,"abstract":"As process technology scales down to deep sub-micron and the frequency of a high-performance processor increases beyond 300 MHz, coupling induced signal integrity problems become more severe. Ignoring coupling effects can lead to functional failures or speed degradation. As a result, the traditional approach of repeater insertion driven by propagation delay and slew rate optimization becomes inadequate. The authors propose a design methodology to select optimal repeaters for high-performance processors by considering not only the delay and slew rate, but also crosstalk effects. A concurrent decision diagram (CDD) is further suggested to achieve crosstalk constraints with various trade-offs.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116625420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A pulse-to-static conversion latch with a self-timed control circuit","authors":"W. Hwang, R. Joshi, W. Henkels","doi":"10.1109/ICCD.1997.628943","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628943","url":null,"abstract":"The design and experimental demonstration of a low-power pulse-to-static conversion latch circuit is described. The circuit includes self-timed control and a 64-bit latch array, both designed utilizing self-resetting CMOS (SRCMOS) circuit techniques. The self-timed feature of the control requires only one system clock input. The evaluation, reset and write-enable controls are all generated within a control macro. The latch is level sensitive scan design (LSSD) compatible and complies with SRCMOS test modes. Use of these latches facilitates the synchronization, pipelined operation, power-management, and testing of advanced digital systems employing a mix of static and dynamic circuits to achieve high performance. An experimental 64-bit latch array and self-timed control macro, designed for 2.5 V-0.5 /spl mu/m CMOS technology, has been successfully fabricated and tested. The full circuit occupies an area of 1.704 mm/spl times/0.07 mm, and the size of latch bit cell is 21.6 /spl mu/m/spl times/70 /spl mu/m. Experimental results have shown the conversion latch to function properly, capturing 1.2 ns output pulses from an SRCMOS register file, and properly converting them to static levels. The measured delay from global clock to static output was 725 ps.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130136842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of maximum power for sequential circuits considering spurious transitions","authors":"Chuan-Yu Wang, K. Roy","doi":"10.1109/ICCD.1997.628948","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628948","url":null,"abstract":"With the high demand for reliability and performance, accurate estimation of maximum instantaneous power dissipation in CMOS circuits is essential to determine the IR drop on supply lines and to optimize the power and ground routing. Unfortunately, the problem of determining the input patterns to induce maximum current, and hence, the maximum power, is NP-complete. Even for circuits with small number of primary inputs (PIs), it is CPU time intensive to conduct efficiently search in the input vector space. The authors present an automatic test generation (ATG) based technique to efficiently generate tight lower bounds of the maximum instananeous power for CMOS sequential circuits under non-zero gate delays. Power dissipation due to spurious transitions has been considered by incorporating static timing analysis into the estimation process. Experiments were performed on ISCAS and MCNC benchmarks. Results show that the ATG-based technique is superior to the traditional simulation-based technique in both speed and performance. On average, for sequential circuits having over 10,000 gates (ISCAS-89 benchmarks), the ATG-based approach executes 981 times faster, and generates a lower bound which is 1.8 times better compared to simulation based approaches.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125390215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power driven partial scan","authors":"Jing-Yang Jou, Ming-Chang Nien","doi":"10.1109/ICCD.1997.628933","DOIUrl":"https://doi.org/10.1109/ICCD.1997.628933","url":null,"abstract":"The power consumption and testability are two of major considerations in modern VLSI design. A full-scan method had been used widely in the past to improve the testability of sequential circuits. Due to the lower overheads incurred, the partial-scan design has gradually become popular. The authors propose a partial scan selection strategy that bases on the structural analysis approach and considers the area and power overheads simultaneously. A powerful sample-and-search algorithm is used to find the solution that minimizes the user-specified cost function in term of power and area overheads. The experimental results show that the sample-and-search algorithm can effectively find the best solution of the specified cost function for almost all circuits, and the saving of overheads on average for each specific cost function is significant.","PeriodicalId":154864,"journal":{"name":"Proceedings International Conference on Computer Design VLSI in Computers and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128823671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}