{"title":"Design of instruction stream buffer with trace support for X86 processors","authors":"J. Chiu, I. Huang, C. Chung","doi":"10.1109/ICCD.2000.878299","DOIUrl":null,"url":null,"abstract":"The potential performance of superscalar microprocessors can be exploited only when fed with sufficient instruction bandwidth. The front-end units, the instruction stream buffer and the fetcher, are the key elements achieving this goal. In most current processors, instruction stream buffers cannot support the instruction sequence beyond a basic block. The fetch rates are constrained by the branch barriers. In x86 processors, the split-line instruction problem worsens this constrain. We propose a design to improve instruction stream buffer performance by coupling it with BTB to support trace prediction. According to the simulation results of such an instruction stream buffer, the maximum fetch bandwidth can reach 8.42 x86 instructions per cycle. Furthermore, we suggest that the instruction stream buffer consists of two 64-bytes entries. Compared with other existing designs, this instruction stream buffer can improve performance by 90% over current x86 processor instruction fetching on average.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2000 International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2000.878299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The potential performance of superscalar microprocessors can be exploited only when fed with sufficient instruction bandwidth. The front-end units, the instruction stream buffer and the fetcher, are the key elements achieving this goal. In most current processors, instruction stream buffers cannot support the instruction sequence beyond a basic block. The fetch rates are constrained by the branch barriers. In x86 processors, the split-line instruction problem worsens this constrain. We propose a design to improve instruction stream buffer performance by coupling it with BTB to support trace prediction. According to the simulation results of such an instruction stream buffer, the maximum fetch bandwidth can reach 8.42 x86 instructions per cycle. Furthermore, we suggest that the instruction stream buffer consists of two 64-bytes entries. Compared with other existing designs, this instruction stream buffer can improve performance by 90% over current x86 processor instruction fetching on average.