H. Kadota, J. Miyake, I. Okabayashi, T. Maeda, T. Okamoto, Y. Takagi, K. Kagawa, E. Ichinohe
{"title":"带有片上缓存和传输前瞻缓冲器的CMOS 32b微处理器","authors":"H. Kadota, J. Miyake, I. Okabayashi, T. Maeda, T. Okamoto, Y. Takagi, K. Kagawa, E. Ichinohe","doi":"10.1109/ISSCC.1987.1157159","DOIUrl":null,"url":null,"abstract":"THIS PAPER WILL DESCRIBE a singlechip CMOS 32b microprocessor supporting a smart memory hierarchy with on-chip Cache and TLB (Transmission Lookaside Buffer). The chip, containing 372k transistors, has been fabricated by using a double-metal layer CMOS technology with lpn design rule. It operates at 8011s machine cycle time and dissipates 1.7W. A high-speed address translation device is essential for the virtual memory system, and two full-associative TLBs for supervisor and user mode, respectively, are implemented for that purpose. Each device has 32 entries composed of a 28b data field (SRAM), a 29b tag field (CAM’) and replace control LRU (Least-Recently-Used) circuits: Figure 1. The pageoize can be varied from 512 to 4K bytes by 3b searchmasking of virtual address tag bits. The TLB access time is less than 22ns, with a half machine cycle (40ns) for a complete address translation, virtua2 to phys ica l , and carried out by an off-chip TLB in about 100ns. The replacement algorithm, LRU, is realized by a 32 x 5b matrix of magnitude comparator and counter. The tag field includes task-ID (TID) bits, in addition to virtual address bits and a valid bit. The task-ID bits are used for checking and taskassigned invalidation of entries. These functions serve for effective management and rapid context switching in a multi-tasking system. The LKbyte Instruction Cache relieves a I/O bottle-neck. A lpm-process technology permits the cache to be of large enough size for the multi-task environment. Its structure is two-way set associative, and 256 x 2 entries are composed of 26b tag fields (SRAM) and 32h data fields (SRAM): Figure 2. This Cache is virtually addressed, and its access time is less than 18ns in the hit case.","PeriodicalId":102932,"journal":{"name":"1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A CMOS 32b microprocessor with on-chip cache and transmission lookahead buffer\",\"authors\":\"H. Kadota, J. Miyake, I. Okabayashi, T. Maeda, T. Okamoto, Y. Takagi, K. Kagawa, E. Ichinohe\",\"doi\":\"10.1109/ISSCC.1987.1157159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"THIS PAPER WILL DESCRIBE a singlechip CMOS 32b microprocessor supporting a smart memory hierarchy with on-chip Cache and TLB (Transmission Lookaside Buffer). The chip, containing 372k transistors, has been fabricated by using a double-metal layer CMOS technology with lpn design rule. It operates at 8011s machine cycle time and dissipates 1.7W. A high-speed address translation device is essential for the virtual memory system, and two full-associative TLBs for supervisor and user mode, respectively, are implemented for that purpose. Each device has 32 entries composed of a 28b data field (SRAM), a 29b tag field (CAM’) and replace control LRU (Least-Recently-Used) circuits: Figure 1. The pageoize can be varied from 512 to 4K bytes by 3b searchmasking of virtual address tag bits. The TLB access time is less than 22ns, with a half machine cycle (40ns) for a complete address translation, virtua2 to phys ica l , and carried out by an off-chip TLB in about 100ns. The replacement algorithm, LRU, is realized by a 32 x 5b matrix of magnitude comparator and counter. The tag field includes task-ID (TID) bits, in addition to virtual address bits and a valid bit. The task-ID bits are used for checking and taskassigned invalidation of entries. These functions serve for effective management and rapid context switching in a multi-tasking system. The LKbyte Instruction Cache relieves a I/O bottle-neck. A lpm-process technology permits the cache to be of large enough size for the multi-task environment. Its structure is two-way set associative, and 256 x 2 entries are composed of 26b tag fields (SRAM) and 32h data fields (SRAM): Figure 2. This Cache is virtually addressed, and its access time is less than 18ns in the hit case.\",\"PeriodicalId\":102932,\"journal\":{\"name\":\"1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSCC.1987.1157159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"1987 IEEE International Solid-State Circuits Conference. Digest of Technical Papers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC.1987.1157159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A CMOS 32b microprocessor with on-chip cache and transmission lookahead buffer
THIS PAPER WILL DESCRIBE a singlechip CMOS 32b microprocessor supporting a smart memory hierarchy with on-chip Cache and TLB (Transmission Lookaside Buffer). The chip, containing 372k transistors, has been fabricated by using a double-metal layer CMOS technology with lpn design rule. It operates at 8011s machine cycle time and dissipates 1.7W. A high-speed address translation device is essential for the virtual memory system, and two full-associative TLBs for supervisor and user mode, respectively, are implemented for that purpose. Each device has 32 entries composed of a 28b data field (SRAM), a 29b tag field (CAM’) and replace control LRU (Least-Recently-Used) circuits: Figure 1. The pageoize can be varied from 512 to 4K bytes by 3b searchmasking of virtual address tag bits. The TLB access time is less than 22ns, with a half machine cycle (40ns) for a complete address translation, virtua2 to phys ica l , and carried out by an off-chip TLB in about 100ns. The replacement algorithm, LRU, is realized by a 32 x 5b matrix of magnitude comparator and counter. The tag field includes task-ID (TID) bits, in addition to virtual address bits and a valid bit. The task-ID bits are used for checking and taskassigned invalidation of entries. These functions serve for effective management and rapid context switching in a multi-tasking system. The LKbyte Instruction Cache relieves a I/O bottle-neck. A lpm-process technology permits the cache to be of large enough size for the multi-task environment. Its structure is two-way set associative, and 256 x 2 entries are composed of 26b tag fields (SRAM) and 32h data fields (SRAM): Figure 2. This Cache is virtually addressed, and its access time is less than 18ns in the hit case.