Proceedings Seventeenth Conference on Advanced Research in VLSI最新文献

A high-speed asynchronous decompression circuit for embedded processors 嵌入式处理器的高速异步解压缩电路

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634856

Martin Benes, A. Wolfe, S. Nowick

{"title":"A high-speed asynchronous decompression circuit for embedded processors","authors":"Martin Benes, A. Wolfe, S. Nowick","doi":"10.1109/ARVLSI.1997.634856","DOIUrl":"https://doi.org/10.1109/ARVLSI.1997.634856","url":null,"abstract":"This paper describes the architecture and implementation of a high-speed decompression engine for embedded processors. The engine is targeted to processors where embedded programs are stored in compressed form, and decompressed at runtime during instruction cache refill. The decompression engine uses a unique asynchronous variable decompression rate architecture to process Huffman-encoded instructions. The resulting circuit is significantly smaller than comparable synchronous decoders, yet has a higher throughput rate than almost almost all existing designs. The 0.8 /spl mu/m layout is all full-custom and contains predominantly dynamic domino logic. The top-level control, as well as several small state machines, are implemented using, asynchronous logic. The design operates without a user-supplied clock. Simulations using Lsim show average throughput of 32 bits/45 ns on the output side, corresponding to about 480 Mbit/sec on the input side. The chip has been manufactured by MOSIS; tests show that the asynchronous implementation operates correctly, with an average throughput exceeding simulations: 32 bits/39 ns on the output side, corresponding to about 560 Mbit/sec on the input side. This speed is acceptable for our application. The area of the design (excluding the pad-frame overhead) is only 0.75 mm/sup 2/. The design is the first fabricated chip for an instruction decompression unit for embedded processors.","PeriodicalId":201675,"journal":{"name":"Proceedings Seventeenth Conference on Advanced Research in VLSI","volume":"117 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121264445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Trends of key advanced device technologies 关键先进器件技术发展趋势

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634847

B. C. Hwang

引用次数: 1

Pipelined multi-queue management in a VLSI ATM switch chip with credit-based flow-control 基于信用流控制的VLSI ATM交换芯片的流水线多队列管理

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634851

Georgios Kornaros, C. Kozyrakis, Panagiota Vatsolaki, M. Katevenis

{"title":"Pipelined multi-queue management in a VLSI ATM switch chip with credit-based flow-control","authors":"Georgios Kornaros, C. Kozyrakis, Panagiota Vatsolaki, M. Katevenis","doi":"10.1109/ARVLSI.1997.634851","DOIUrl":"https://doi.org/10.1109/ARVLSI.1997.634851","url":null,"abstract":"We describe the queue management block of ATLAS I, a single-chip ATM switch (roster) with optional credit-based (backpressure) flow control. ATLAS I is a 4-million-transistor 0.35-micron CMOS chip, currently under development, offering 20 Gbit/s aggregate I/O throughput, sub-microsecond cut-through latency, 256-cell shared buffer containing multiple logical output queues, priorities, multicasting, and load monitoring. The queue management block of ATLAS I is a dual parallel pipeline that manages the multiple queues of ready cells, the per-flow-group credits, and the cells that are waiting for credits. All cells, in all queues, share one, common buffer space. These 3- and Q-stage pipelines handle events at the rate of one cell arrival or departure per clock cycle, and one credit arrival per clock cycle. The queue management block consists of two compiled SRAMs, pipeline bypass logic, and multi-port CAM and SRAM blocks that are laid out in full-custom and support special access operations. The full-custom part of queue management contains approximately 65 thousand transistors in logic and 14 Kbits in various special memories, it occupies 2.3 mm/sup 2/, it consumes 270 mW (worst case), and it operates at 80 MHz (worst case) versus 50 MHz which is the required clock frequency to support the 622 Mb/s switch link rate.","PeriodicalId":201675,"journal":{"name":"Proceedings Seventeenth Conference on Advanced Research in VLSI","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123671933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Circuits and technology for Digital's StrongARM and ALPHA microprocessors [CMOS technology] Digital公司StrongARM和ALPHA微处理器的电路和技术[CMOS技术]

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634842

D. Dobberpuhl

引用次数: 17

The design of an asynchronous MIPS R3000 microprocessor 一种异步MIPS R3000微处理器的设计

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634853

Alain J. Martin, Andrew Lines, R. Manohar, M. Nyström, P. Pénzes, Robert Southworth, U. Cummings

引用次数: 337

Fault scanner for reconfigurable logic 用于可重构逻辑的故障扫描器

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634857

N. Shnidman, W. Mangione-Smith, M. Potkonjak

引用次数: 10

An embedded DRAM for CMOS ASICs 一种用于CMOS专用集成电路的嵌入式DRAM

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634861

J. Poulton

{"title":"An embedded DRAM for CMOS ASICs","authors":"J. Poulton","doi":"10.1109/ARVLSI.1997.634861","DOIUrl":"https://doi.org/10.1109/ARVLSI.1997.634861","url":null,"abstract":"The growing gap between on-chip gates and off-chip I/O bandwidth argues for ever larger amounts of on-chip memory. Emerging portable consumer technology, such as digital cameras, will also require more memory than can be supported easily on logic-oriented ASIC processes. Most ASIC memory systems are P-load SRAM, but this circuit technology is neither dense nor power efficient. This paper describes development of a DRAM, compatible with a standard CMOS ASIC process, that provides a memory density at least 4/spl times/ improved over P-load SRAM in the same layout roles. It runs at speeds comparable to logic in the same process and uses circuitry that is reasonably simple and portable. The design employs Vdd-precharge bit lines, half-capacitance full-voltage dummy cells, and a simple complementary sense amplifier. DRAM is organized as a number of small pages, allowing simple circuit design and low-power operation at modest expense in area overhead. The paper also described a power-conserving low-voltage-swing bus design that interfaces multiple pages to full-voltage-swing circuitry. Circuit and layout details are provided, along with experimental results for a 100 MHz 786K-bit embedded DRAM in a 0.5 /spl mu/m process.","PeriodicalId":201675,"journal":{"name":"Proceedings Seventeenth Conference on Advanced Research in VLSI","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133030690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Circuits and microarchitecture for gigahertz VLSI designs 千兆赫VLSI设计的电路与微架构

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634860

K. Nowka, H. P. Hofstee

引用次数: 2

Scalability in computing for today and tomorrow 今天和未来计算的可伸缩性

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634843

D. Parry

引用次数: 2

The hierarchical multi-bank DRAM: a high-performance architecture for memory integrated with processors 分层多组DRAM:一种集成了处理器的高性能存储器体系结构

Proceedings Seventeenth Conference on Advanced Research in VLSI Pub Date : 1997-09-15 DOI: 10.1109/ARVLSI.1997.634862

T. Yamauchi, Lance Hammond, K. Olukotun

{"title":"The hierarchical multi-bank DRAM: a high-performance architecture for memory integrated with processors","authors":"T. Yamauchi, Lance Hammond, K. Olukotun","doi":"10.1109/ARVLSI.1997.634862","DOIUrl":"https://doi.org/10.1109/ARVLSI.1997.634862","url":null,"abstract":"A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However a high performance microprocessor will typically send more accesses than the DRAM can handle due to the long cycle time of the embedded DRAM, especially in applications with significant memory requirements. A multi-bank DRAM can hide the long cycle time by allowing the DRAM to process multiple accesses in parallel, but it will incur a significant area penalty and will therefore restrict the density of the embedded DRAM main memory. In this paper we propose a hierarchical multi-bank DRAM architecture to achieve high system performance with a minimal area penalty. In this architecture, the independent memory banks are each divided into many semi-independent subbanks that share I/O and decoder resources. A hierarchical multi-bank DRAM with 4 main banks each composed of 32 subbanks occupies approximately the same area as a conventional 4 bank DRAM while performing like a 32 bank one-up to 65% better than a conventional 4 bank DRAM when integrated with a single-chip multiprocessor.","PeriodicalId":201675,"journal":{"name":"Proceedings Seventeenth Conference on Advanced Research in VLSI","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115186859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35