H. Kadota, J. Miyake, I. Okabayashi, T. Maeda, T. Okamoto, Y. Takagi, K. Kagawa, E. Ichinohe
{"title":"A CMOS 32b microprocessor with on-chip cache and transmission lookahead buffer","authors":"H. Kadota, J. Miyake, I. Okabayashi, T. Maeda, T. Okamoto, Y. Takagi, K. Kagawa, E. Ichinohe","doi":"10.1109/ISSCC.1987.1157159","DOIUrl":null,"url":null,"abstract":"THIS PAPER WILL DESCRIBE a singlechip CMOS 32b microprocessor supporting a smart memory hierarchy with on-chip Cache and TLB (Transmission Lookaside Buffer). The chip, containing 372k transistors, has been fabricated by using a double-metal layer CMOS technology with lpn design rule. It operates at 8011s machine cycle time and dissipates 1.7W. A high-speed address translation device is essential for the virtual memory system, and two full-associative TLBs for supervisor and user mode, respectively, are implemented for that purpose. Each device has 32 entries composed of a 28b data field (SRAM), a 29b tag field (CAM’) and replace control LRU (Least-Recently-Used) circuits: Figure 1. The pageoize can be varied from 512 to 4K bytes by 3b searchmasking of virtual address tag bits. The TLB access time is less than 22ns, with a half machine cycle (40ns) for a complete address translation, virtua2 to phys ica l , and carried out by an off-chip TLB in about 100ns. The replacement algorithm, LRU, is realized by a 32 x 5b matrix of magnitude comparator and counter. The tag field includes task-ID (TID) bits, in addition to virtual address bits and a valid bit. The task-ID bits are used for checking and taskassigned invalidation of entries. These functions serve for effective management and rapid context switching in a multi-tasking system. The LKbyte Instruction Cache relieves a I/O bottle-neck. A lpm-process technology permits the cache to be of large enough size for the multi-task environment. Its structure is two-way set associative, and 256 x 2 entries are composed of 26b tag fields (SRAM) and 32h data fields (SRAM): Figure 2. This Cache is virtually addressed, and its access time is less than 18ns in the hit case.","PeriodicalId":102932,"journal":{"name":"1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.1987.1157159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
THIS PAPER WILL DESCRIBE a singlechip CMOS 32b microprocessor supporting a smart memory hierarchy with on-chip Cache and TLB (Transmission Lookaside Buffer). The chip, containing 372k transistors, has been fabricated by using a double-metal layer CMOS technology with lpn design rule. It operates at 8011s machine cycle time and dissipates 1.7W. A high-speed address translation device is essential for the virtual memory system, and two full-associative TLBs for supervisor and user mode, respectively, are implemented for that purpose. Each device has 32 entries composed of a 28b data field (SRAM), a 29b tag field (CAM’) and replace control LRU (Least-Recently-Used) circuits: Figure 1. The pageoize can be varied from 512 to 4K bytes by 3b searchmasking of virtual address tag bits. The TLB access time is less than 22ns, with a half machine cycle (40ns) for a complete address translation, virtua2 to phys ica l , and carried out by an off-chip TLB in about 100ns. The replacement algorithm, LRU, is realized by a 32 x 5b matrix of magnitude comparator and counter. The tag field includes task-ID (TID) bits, in addition to virtual address bits and a valid bit. The task-ID bits are used for checking and taskassigned invalidation of entries. These functions serve for effective management and rapid context switching in a multi-tasking system. The LKbyte Instruction Cache relieves a I/O bottle-neck. A lpm-process technology permits the cache to be of large enough size for the multi-task environment. Its structure is two-way set associative, and 256 x 2 entries are composed of 26b tag fields (SRAM) and 32h data fields (SRAM): Figure 2. This Cache is virtually addressed, and its access time is less than 18ns in the hit case.