Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)最新文献_第2页

Dynamic IPC/clock rate optimization 动态IPC/时钟速率优化

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694788

D. Albonesi

引用次数: 117

Design choices in the SHRIMP system: an empirical study SHRIMP系统中的设计选择:一个实证研究

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694792

M. Blumrich, R. Alpert, Yuqun Chen, D. Clark, Stefanos N. Damianakis, C. Dubnicki, E. Felten, L. Iftode, Kai Li, M. Martonosi, R. A. Shillner

引用次数: 50

Active Pages: a computation model for intelligent memory 活动页面:智能内存的计算模型

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694774

M. Oskin, F. Chong, T. Sherwood

引用次数: 351

Using prediction to accelerate coherence protocols 利用预测加速相干协议

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279386

Shubhendu S. Mukherjee, M. Hill

{"title":"Using prediction to accelerate coherence protocols","authors":"Shubhendu S. Mukherjee, M. Hill","doi":"10.1145/279358.279386","DOIUrl":"https://doi.org/10.1145/279358.279386","url":null,"abstract":"Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent. Some memory references in such systems, however suffer long latencies for misses to remotely-cached blocks. To ameliorate this latency, researchers have augmented standard coherence protocols with optimizations for specific sharing patterns, such as read-modify-write, producer-consumer and migratory sharing. This paper seeks to replace these directed solutions with general prediction logic that monitors coherence activity and triggers appropriate coherence actions. This paper takes the first step toward using general prediction to accelerate coherence protocols by developing and evaluating the Cosmos coherence message predictor. Cosmos predicts the source and type of the next coherence message for a cache block using logic that is an extension of Yeh and Patt's two-level PAp branch predictor. For five scientific applications running on 16 processors, Cosmos has prediction accuracies of 62% to 93%. Cosmos' high prediction accuracy is a result of predictable coherence message signatures that arise from stable sharing patterns of cache blocks.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123977488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 122

Memory dependence prediction using store sets 使用存储集进行内存依赖性预测

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694770

George Z. Chrysos, J. Emer

引用次数: 346

Analytic evaluation of shared-memory systems with ILP processors 具有ILP处理器的共享内存系统的分析评价

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694797

Daniel J. Sorin, Vijay S. Pai, S. Adve, M. Vernon, D. Wood

引用次数: 114

Modeling program predictability 建模程序可预测性

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279371

Yiannakis Sazeides, James E. Smith

引用次数: 68

An analysis of correlation and predictability: what makes two-level branch predictors work 相关性和可预测性的分析:是什么使两级分支预测器工作

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694762

M. Evers, Sanjay J. Patel, R. Chappell, Y. Patt

{"title":"An analysis of correlation and predictability: what makes two-level branch predictors work","authors":"M. Evers, Sanjay J. Patel, R. Chappell, Y. Patt","doi":"10.1109/ISCA.1998.694762","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694762","url":null,"abstract":"Pipeline flushes due to branch mispredictions is one of the most serious problems facing the designer of a deeply pipelined, superscalar processor. Many branch predictors have been proposed to help alleviate this problem, including two-level adaptive branch predictors and hybrid branch predictors. Numerous studies have shown which predictors and configurations best predict the branches in a given set of benchmarks. Some studies have also investigated effects, such as pattern history table interference, that can be detrimental to the performance of these predictors. However, little research has been done on which characteristics of branch behavior make predictors perform well. In this paper we investigate and quantify reasons why branches are predictable. We show that some of this predictability is not captured by the two-level adaptive branch predictors. An understanding of the predictability of branches may lead to insights ultimately resulting in better or less complex predictors. We also investigate and quantify what function of the branches in each benchmark is predictable using each of the methods described in this paper.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126367708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 117

Performance modeling and code partitioning for the DS architecture DS体系结构的性能建模和代码分区

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694789

Yinong Zhang, G. Adams

{"title":"Performance modeling and code partitioning for the DS architecture","authors":"Yinong Zhang, G. Adams","doi":"10.1109/ISCA.1998.694789","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694789","url":null,"abstract":"DS (Decoupled-Superscalar) is a new microarchitecture that combines decoupled and superscalar techniques to exploit instruction level parallelism. Issue bandwidth is increased while circuit complexity growth is controlled with little negative impact on performance. Programs for DS are compiled into two instruction substreams: the dominant substream navigates the control flow and the rest of computational task is shared between the dominant and subsidiary substreams. Each substream is processed by a separate superscalar core realizable with current VLSI technology. DS machines are binary compatible with superscalar machines having the same instruction set, and a family of DS machines is binary compatible without recompilation. DS run time behavior is examined with an analytical model. A novel technique for controlling slip between substreams is introduced. Code partitioning issues of instruction count balancing and residence time balancing, important to any split-stream scheme, are discussed. Simulation shows DS achieves performance comparable to an aggressive superscalar, but with potentially less complex hardware and faster clock rate.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124321680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Pipeline gating: speculation control for energy reduction 管道门控:降低能耗的投机控制

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694769

Srilatha Manne, A. Klauser, D. Grunwald

{"title":"Pipeline gating: speculation control for energy reduction","authors":"Srilatha Manne, A. Klauser, D. Grunwald","doi":"10.1109/ISCA.1998.694769","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694769","url":null,"abstract":"Branch prediction has enabled microprocessors to increase instruction level parallelism (ILP) by allowing programs to speculatively execute beyond control boundaries. Although speculative execution is essential for increasing the instructions per cycle (IPC), it does come at a cost. A large amount of unnecessary work results from wrong-path instructions entering the pipeline due to branch misprediction. Results generated with the SimpleScalar tool set using a 4-way issue pipeline and various branch predictors show an instruction overhead of 16% to 105% for event instruction committed. The instruction overhead will increase in the future as processors use more aggressive speculation and wider issue widths. In this paper we present an innovative method for power reduction ,which, unlike previous work that sacrificed flexibility or performance reduces power in high-performance microprocessors without impacting performance. In particular we introduce a hardware mechanism called pipeline gating to control rampant speculation in the pipeline. We present inexpensive mechanisms for determining when a branch is likely to mispredict, and for stopping wrong-path instructions from entering the pipeline. Results show up to a 38% reduction in wrong-path instructions with a negligible performance loss (/spl ap/1%). Best of all, even in programs with a high branch prediction accuracy, performance does not noticeable degrade. Our analysis indicates that there is little risk in implementing this method in existing processors since it does not impact performance and can benefit energy reduction.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121615707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 479