Conference Proceedings. The 24th Annual International Symposium on Computer Architecture最新文献_第3页

Exploiting Instruction Level Parallelism In Processors By Caching Scheduled Groups 通过缓存计划组来利用处理器中的指令级并行性

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264125

R. Nair, Martin E. Hopkins

引用次数: 106

Prefetching Using Markov Predictors 使用马尔可夫预测器预取

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264207

Douglas J. Joseph, D. Grunwald

引用次数: 673

Tolerating Multiple Failures In Raid Architectures With Optimal Storage And Uniform Declustering 在具有最佳存储和统一集群的Raid架构中容忍多个故障

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264132

G. A. Alvarez, W. Burkhard, F. Cristian

引用次数: 107

A Language For Describing Predictors And Its Application To Automatic Synthesis 一种描述预测因子的语言及其在自动合成中的应用

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264212

J. Emer, Nicholas C. Gloy

{"title":"A Language For Describing Predictors And Its Application To Automatic Synthesis","authors":"J. Emer, Nicholas C. Gloy","doi":"10.1145/264107.264212","DOIUrl":"https://doi.org/10.1145/264107.264212","url":null,"abstract":"As processor architectures have increased their reliance on speculative execution to improve performance, the importance of accurate prediction of what to execute speculatively has increased. Furthermore, the types of values predicted have expanded from the ubiquitous branch and call/return targets to the prediction of indirect jump targets, cache ways and data values. In general, the prediction process is one of identifying the current state of the system, and making a prediction for some as yet uncomputed value based on that state. Prediction accuracy is improved by learning what is a good prediction for that state using a feedback process at the time the predicted value is actually computed. While there have been a number of efforts to formally characterize this process, we have taken the approach of providing a simple algebraic-style notation that allows one to express this state identification and feedback process. This notation allows one to describe a wide variety of predictors in a uniform way. It also facilitates the use of an efficient search technique called genetic programming, which is loosely modeled on the natural evolutionary process, to explore the design space. In this paper we describe our notation and the results of the application of genetic programming to the design of branch and indirect jump predictors.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125048349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

Improving Superscalar Instruction Dispatch And Issue By Exploiting Dynamic Code Sequences 利用动态码序列改进超标量指令的调度和发布

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264119

S. Vajapeyam, T. Mitra

{"title":"Improving Superscalar Instruction Dispatch And Issue By Exploiting Dynamic Code Sequences","authors":"S. Vajapeyam, T. Mitra","doi":"10.1145/264107.264119","DOIUrl":"https://doi.org/10.1145/264107.264119","url":null,"abstract":"Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetch bandwidth cannot be exploited unless pipeline stages further downstream correspondingly improve. In particular, register renaming a large number of instructions per cycle is difficult. A large instruction window, needed to receive multiple basic blocks per cycle, will slow down dependence resolution and instruction issue. This paper addresses these and related issues by proposing (i) partitioning of the instruction window into multiple blocks, each holding a dynamic code sequence; (ii) logical partitioning of the register file into a global file and several local files, the latter holding registers local to a dynamic code sequence; (iii) the dynamic recording and reuse of register renaming information for registers local to a dynamic code sequence. Performance studies show these mechanisms improve performance over traditional superscalar processors by factors ranging from 1.5 to a little over 3 for the SPEC Integer programs. Next, it is observed that several of the loops in the benchmarks display vector-like behavior during execution, even if the static loop bodies are likely complex for compile-time vectorization. A dynamic loop vectorization mechanism that builds on top of the above mechanisms is briefly outlined. The mechanism vectorizes up to 60% of the dynamic instructions for some programs, albeit the average number of iterations per loop is quite small.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128815892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 120

Datascalar Architectures Datascalar架构

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264215

D. Burger, S. Kaxiras, J. Goodman

引用次数: 52

Target Prediction For Indirect Jumps 间接跳跃的目标预测

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1109/ISCA.1997.604707

Po-Yung Chang, E. Hao, Y. Patt

引用次数: 153

Dynamic Instruction Reuse 动态指令重用

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-05-01 DOI: 10.1145/264107.264200

Avinash Sodani, G. Sohi

引用次数: 386

Trading Conflict And Capacity Aliasing In Conditional Branch Predictors 条件分支预测器中的交易冲突与容量混叠

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-05-01 DOI: 10.1145/264107.264211

P. Michaud, André Seznec, R. Uhlig

{"title":"Trading Conflict And Capacity Aliasing In Conditional Branch Predictors","authors":"P. Michaud, André Seznec, R. Uhlig","doi":"10.1145/264107.264211","DOIUrl":"https://doi.org/10.1145/264107.264211","url":null,"abstract":"As modern microprocessors employ deeper pipelines and issue multiple instructions per cycle, they are becoming increasingly dependent on accurate branch prediction. Because hardware resources for branch-predictor tables are invariably limited, it is not possible to hold all relevant branch history for all active branches at the same time, especially for large workloads consisting of multiple processes and operating-system code. The problem that results, commonly referred to as aliasing in the branch-predictor tables, is in many ways similar to the misses that occur in finite-sized hardware caches.In this paper we propose a new classification for branch aliasing based on the three-Cs model for caches, and show that conflict aliasing is a significant source of mispredictions. Unfortunately, the obvious method for removing conflicts --- adding tags and associativity to the predictor tables --- is not a cost-effective solution.To address this problem, we propose the skewed branch predictor, a multi-bank, tag-less branch predictor, designed specifically to reduce the impact of conflict aliasing. Through both analytical and simulation models, we show that the skewed branch predictor removes a substantial portion of conflict aliasing by introducing redundancy to the branch-predictor tables. Although this redundancy increases capacity aliasing compared to a standard one-bank structure of comparable size, our simulations show that the reduction in conflict aliasing overcomes this effect to yield a gain in prediction accuracy. Alternatively, we show that a skewed organization can achieve the same prediction accuracy as a standard one-bank organization but with half the storage requirements.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"44 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120926428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 188

Vm-based Shared Memory On Low-latency, Remote-memory-access Networks 低延迟、远程内存访问网络中基于虚拟机的共享内存

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1996-11-01 DOI: 10.1145/264107.264163

L. Kontothanassis, G. Hunt, R. Stets, N. Hardavellas, Michal Cierniak, Srinivasan Parthasarathy, Wagner Meira, Jr, S. Dwarkadas, M. Scott

{"title":"Vm-based Shared Memory On Low-latency, Remote-memory-access Networks","authors":"L. Kontothanassis, G. Hunt, R. Stets, N. Hardavellas, Michal Cierniak, Srinivasan Parthasarathy, Wagner Meira, Jr, S. Dwarkadas, M. Scott","doi":"10.1145/264107.264163","DOIUrl":"https://doi.org/10.1145/264107.264163","url":null,"abstract":"Recent technological advances have produced network interfaces that provide users with very low-latency access to the memory of remote machines. We examine the impact of such networks on the implementation and performance of software DSM. Specifically, we compare two DSM systems---Cashmere and TreadMarks---on a 32-processor DEC Alpha cluster connected by a Memory Channel network.Both Cashmere and TreadMarks use virtual memory to maintain coherence on pages, and both use lazy, multi-writer release consistency. The systems differ dramatically, however, in the mechanisms used to track sharing information and to collect and merge concurrent updates to a page, with the result that Cashmere communicates much more frequently, and at a much finer grain.Our principal conclusion is that low-latency networks make DSM based on fine-grain communication competitive with more coarse-grain approaches, but that further hardware improvements will be needed before such systems can provide consistently superior performance. In our experiments, Cashmere scales slightly better than TreadMarks for applications with false sharing. At the same time, it is severely constrained by limitations of the current Memory Channel hardware. In general, performance is better for TreadMarks.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"193 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1996-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134041099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 78