{"title":"多路跳转和预取的高效硬件","authors":"K. Karplus, A. Nicolau","doi":"10.1145/18927.18908","DOIUrl":null,"url":null,"abstract":"Two recent trends in computer architecture have been increasing the size and complexity of microprograms: RISC machines, array processors, and VLIW machines are programmed directly in microcode, and CISC machines have large microcode programs that interpret higher-level machine instructions. The difficulty of developing and maintaining large microprograms suggests that they should be written in a high-level language and compiled by optimizing compilers. Conventional optimizing compilers have not been particularly effective for microcode compiling, because they optimize primarily within basic blocks (that is, segments of sequential code, uninterrupted by conditional jumps or jump targets), which are too small (3-5 instructions) to provide much code rearrangement. Hand coding, though slow and error-prone, has offered significant performance advantages over compiled microcode. Recent advances in optimization techniques-notably, trace scheduling [Fisher811 and percolation scheduling [Nicolau84]offer code rearrangement that crosses basic block boundaries. These code rearrangement techniques tend to cluster conditional jumps. Since conditional jumps make up 15-33% of the initial microcode, combining the conditional jumps of a cluster into a single multi-way jump offers substantial improvements in speed. Various schemes have been proposed in the past for multiway jumps, but they have generally been unsatisfactory. One common problem is insufficient generality to represent the clusters of conditional jumps found by the new optimization techniques. Another, potentially more serious, problem is that the multi-way jump mechanisms interfere with instruction prefetching. A microcode memory system must operate at the speed of the instruction decoder and the data path. Although fast memories are available, they are small and expensive. Recent memory chip manufacturing trends are for cheap, large memories that are relatively slow. With current processor and memory speeds, microcode instruction cycles are already 8-16 times faster than access times for large memory chips.","PeriodicalId":221754,"journal":{"name":"MICRO 18","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1985-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Efficient hardware for multiway jumps and pre-fetches\",\"authors\":\"K. Karplus, A. Nicolau\",\"doi\":\"10.1145/18927.18908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Two recent trends in computer architecture have been increasing the size and complexity of microprograms: RISC machines, array processors, and VLIW machines are programmed directly in microcode, and CISC machines have large microcode programs that interpret higher-level machine instructions. The difficulty of developing and maintaining large microprograms suggests that they should be written in a high-level language and compiled by optimizing compilers. Conventional optimizing compilers have not been particularly effective for microcode compiling, because they optimize primarily within basic blocks (that is, segments of sequential code, uninterrupted by conditional jumps or jump targets), which are too small (3-5 instructions) to provide much code rearrangement. Hand coding, though slow and error-prone, has offered significant performance advantages over compiled microcode. Recent advances in optimization techniques-notably, trace scheduling [Fisher811 and percolation scheduling [Nicolau84]offer code rearrangement that crosses basic block boundaries. These code rearrangement techniques tend to cluster conditional jumps. Since conditional jumps make up 15-33% of the initial microcode, combining the conditional jumps of a cluster into a single multi-way jump offers substantial improvements in speed. Various schemes have been proposed in the past for multiway jumps, but they have generally been unsatisfactory. One common problem is insufficient generality to represent the clusters of conditional jumps found by the new optimization techniques. Another, potentially more serious, problem is that the multi-way jump mechanisms interfere with instruction prefetching. A microcode memory system must operate at the speed of the instruction decoder and the data path. Although fast memories are available, they are small and expensive. Recent memory chip manufacturing trends are for cheap, large memories that are relatively slow. With current processor and memory speeds, microcode instruction cycles are already 8-16 times faster than access times for large memory chips.\",\"PeriodicalId\":221754,\"journal\":{\"name\":\"MICRO 18\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1985-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MICRO 18\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/18927.18908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MICRO 18","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/18927.18908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient hardware for multiway jumps and pre-fetches
Two recent trends in computer architecture have been increasing the size and complexity of microprograms: RISC machines, array processors, and VLIW machines are programmed directly in microcode, and CISC machines have large microcode programs that interpret higher-level machine instructions. The difficulty of developing and maintaining large microprograms suggests that they should be written in a high-level language and compiled by optimizing compilers. Conventional optimizing compilers have not been particularly effective for microcode compiling, because they optimize primarily within basic blocks (that is, segments of sequential code, uninterrupted by conditional jumps or jump targets), which are too small (3-5 instructions) to provide much code rearrangement. Hand coding, though slow and error-prone, has offered significant performance advantages over compiled microcode. Recent advances in optimization techniques-notably, trace scheduling [Fisher811 and percolation scheduling [Nicolau84]offer code rearrangement that crosses basic block boundaries. These code rearrangement techniques tend to cluster conditional jumps. Since conditional jumps make up 15-33% of the initial microcode, combining the conditional jumps of a cluster into a single multi-way jump offers substantial improvements in speed. Various schemes have been proposed in the past for multiway jumps, but they have generally been unsatisfactory. One common problem is insufficient generality to represent the clusters of conditional jumps found by the new optimization techniques. Another, potentially more serious, problem is that the multi-way jump mechanisms interfere with instruction prefetching. A microcode memory system must operate at the speed of the instruction decoder and the data path. Although fast memories are available, they are small and expensive. Recent memory chip manufacturing trends are for cheap, large memories that are relatively slow. With current processor and memory speeds, microcode instruction cycles are already 8-16 times faster than access times for large memory chips.