Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)最新文献

筛选
英文 中文
Dynamic IPC/clock rate optimization 动态IPC/时钟速率优化
D. Albonesi
{"title":"Dynamic IPC/clock rate optimization","authors":"D. Albonesi","doi":"10.1109/ISCA.1998.694788","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694788","url":null,"abstract":"Current microprocessor designs set the functionality and clock rate of the chip at design time based on the configuration that achieves the best overall performance over a range of target applications. The result may be poor performance when running applications whose requirements are not well-matched to the particular hardware organization chosen. We present a new approach called Complexity-Adaptive Processors (CAPs) in which the IPC/clock rate tradeoff can be altered at runtime to dynamically match the changing requirements of the instruction stream. By exploiting repeater methodologies used increasingly in deep sub-micron designs, CAPs achieve this flexibility with potentially no cycle time impact compared to a fixed architecture. Our preliminary results in applying this approach to on-chip caches and instruction queues indicate that CAPs have the potential to significantly outperform conventional approaches on workloads containing both general purpose and scientific applications.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130441429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 117
Design choices in the SHRIMP system: an empirical study SHRIMP系统中的设计选择:一个实证研究
M. Blumrich, R. Alpert, Yuqun Chen, D. Clark, Stefanos N. Damianakis, C. Dubnicki, E. Felten, L. Iftode, Kai Li, M. Martonosi, R. A. Shillner
{"title":"Design choices in the SHRIMP system: an empirical study","authors":"M. Blumrich, R. Alpert, Yuqun Chen, D. Clark, Stefanos N. Damianakis, C. Dubnicki, E. Felten, L. Iftode, Kai Li, M. Martonosi, R. A. Shillner","doi":"10.1109/ISCA.1998.694792","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694792","url":null,"abstract":"The SHRIMP cluster-computing system has progressed to a point of relative maturity; a variety of applications are running on a 16-node system. We have enough experience to understand what we did right and wrong in designing and building the system. In this paper we discuss some of the lessons we learned about computer architecture, and about the challenges involved in building a significant working system in an academic research environment. We evaluate significant design choices by modifying the network interface firmware and the system software in order to empirically compare our design to other approaches.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130763034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Active Pages: a computation model for intelligent memory 活动页面:智能内存的计算模型
M. Oskin, F. Chong, T. Sherwood
{"title":"Active Pages: a computation model for intelligent memory","authors":"M. Oskin, F. Chong, T. Sherwood","doi":"10.1109/ISCA.1998.694774","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694774","url":null,"abstract":"Microprocessors and memory systems suffer from a growing gap in performance. We introduce Active Pages, a computation model which addresses this gap by shifting data-intensive computations to the memory system. An Active Page consists of a page of data and a set of associated functions which can operate upon that data. We describe an implementation of Active Pages on RXDram (Reconfigurable Architecture DRAM), a memory system based upon the integration of DRAM and reconfigurable logic. Results from the SimpleScalar simulator demonstrate up to 1000X speedups on several applications using the RADram system versus conventional memory systems. We also explore the sensitivity of our results to implementations in other memory technologies.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133832256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 351
Using prediction to accelerate coherence protocols 利用预测加速相干协议
Shubhendu S. Mukherjee, M. Hill
{"title":"Using prediction to accelerate coherence protocols","authors":"Shubhendu S. Mukherjee, M. Hill","doi":"10.1145/279358.279386","DOIUrl":"https://doi.org/10.1145/279358.279386","url":null,"abstract":"Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent. Some memory references in such systems, however suffer long latencies for misses to remotely-cached blocks. To ameliorate this latency, researchers have augmented standard coherence protocols with optimizations for specific sharing patterns, such as read-modify-write, producer-consumer and migratory sharing. This paper seeks to replace these directed solutions with general prediction logic that monitors coherence activity and triggers appropriate coherence actions. This paper takes the first step toward using general prediction to accelerate coherence protocols by developing and evaluating the Cosmos coherence message predictor. Cosmos predicts the source and type of the next coherence message for a cache block using logic that is an extension of Yeh and Patt's two-level PAp branch predictor. For five scientific applications running on 16 processors, Cosmos has prediction accuracies of 62% to 93%. Cosmos' high prediction accuracy is a result of predictable coherence message signatures that arise from stable sharing patterns of cache blocks.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123977488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 122
Memory dependence prediction using store sets 使用存储集进行内存依赖性预测
George Z. Chrysos, J. Emer
{"title":"Memory dependence prediction using store sets","authors":"George Z. Chrysos, J. Emer","doi":"10.1109/ISCA.1998.694770","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694770","url":null,"abstract":"For maximum performance, an out-of-order processor must issue load instructions as early as possible, while avoiding memory-order violations with prior store instructions that write to the same memory location. One approach is to use memory dependence prediction to identify the stores upon which a load depends, and communicate that information to the instruction scheduler. We designate the set of stores upon which each load has depended as the load's \"store set\". The processor can discover and use a load's store set to accurately predict the earliest time the load can safely execute. We show that store sets accurately predict memory dependencies in the context of large instruction window, superscalar machines, and allow for near-optimal performance compared to an instruction scheduler with perfect knowledge of memory dependencies. In addition, we explore the implementation aspects of store sets, and describe a low cost implementation that achieves nearly optimal performance.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129258578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 346
Analytic evaluation of shared-memory systems with ILP processors 具有ILP处理器的共享内存系统的分析评价
Daniel J. Sorin, Vijay S. Pai, S. Adve, M. Vernon, D. Wood
{"title":"Analytic evaluation of shared-memory systems with ILP processors","authors":"Daniel J. Sorin, Vijay S. Pai, S. Adve, M. Vernon, D. Wood","doi":"10.1109/ISCA.1998.694797","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694797","url":null,"abstract":"This paper develops and validates an analytical model for evaluating various types of architectural alternatives for shared-memory systems with processors that aggressively exploit instruction-level parallelism. Compared to simulation, the analytical model is many orders of magnitude faster to solve, yielding highly accurate system-performance estimates in seconds. The model input parameters characterize the ability of an application to exploit instruction-level parallelism as well as the interaction between the application and the memory system architecture. A trace-driven simulation methodology is developed that allows these parameters to be generated over 100 times faster than with a detailed execution-driven simulator. Finally, this paper shows that the analytical model can be used to gain insights into application performance and to evaluate architectural design trade-offs.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116121993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
Modeling program predictability 建模程序可预测性
Yiannakis Sazeides, James E. Smith
{"title":"Modeling program predictability","authors":"Yiannakis Sazeides, James E. Smith","doi":"10.1145/279358.279371","DOIUrl":"https://doi.org/10.1145/279358.279371","url":null,"abstract":"Basic properties of program predictability-for both values and control-are defined and studied. We take the view that program predictability originates at certain points during a program's execution, flows through subsequent instructions, and then ends at other points in the program. These key components of predictability: generation, propagation, and termination; are defined in terms of a model. The model is based on a graph derived from dynamic data dependences and a predictor. Using the SPEC95 benchmarks, we analyze the predictability phenomena both separately and in combination. Examples are provided to illustrate relationships between model-based characteristics and program constructs. It is shown that most predictability derives from program control structure and immediate values, not program input data. Furthermore, most predictability originates from a relatively small number of generate points. The analysis of obtained results suggests a number of ramifications regarding predictability and its use.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123637293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
An analysis of correlation and predictability: what makes two-level branch predictors work 相关性和可预测性的分析:是什么使两级分支预测器工作
M. Evers, Sanjay J. Patel, R. Chappell, Y. Patt
{"title":"An analysis of correlation and predictability: what makes two-level branch predictors work","authors":"M. Evers, Sanjay J. Patel, R. Chappell, Y. Patt","doi":"10.1109/ISCA.1998.694762","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694762","url":null,"abstract":"Pipeline flushes due to branch mispredictions is one of the most serious problems facing the designer of a deeply pipelined, superscalar processor. Many branch predictors have been proposed to help alleviate this problem, including two-level adaptive branch predictors and hybrid branch predictors. Numerous studies have shown which predictors and configurations best predict the branches in a given set of benchmarks. Some studies have also investigated effects, such as pattern history table interference, that can be detrimental to the performance of these predictors. However, little research has been done on which characteristics of branch behavior make predictors perform well. In this paper we investigate and quantify reasons why branches are predictable. We show that some of this predictability is not captured by the two-level adaptive branch predictors. An understanding of the predictability of branches may lead to insights ultimately resulting in better or less complex predictors. We also investigate and quantify what function of the branches in each benchmark is predictable using each of the methods described in this paper.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126367708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 117
Performance modeling and code partitioning for the DS architecture DS体系结构的性能建模和代码分区
Yinong Zhang, G. Adams
{"title":"Performance modeling and code partitioning for the DS architecture","authors":"Yinong Zhang, G. Adams","doi":"10.1109/ISCA.1998.694789","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694789","url":null,"abstract":"DS (Decoupled-Superscalar) is a new microarchitecture that combines decoupled and superscalar techniques to exploit instruction level parallelism. Issue bandwidth is increased while circuit complexity growth is controlled with little negative impact on performance. Programs for DS are compiled into two instruction substreams: the dominant substream navigates the control flow and the rest of computational task is shared between the dominant and subsidiary substreams. Each substream is processed by a separate superscalar core realizable with current VLSI technology. DS machines are binary compatible with superscalar machines having the same instruction set, and a family of DS machines is binary compatible without recompilation. DS run time behavior is examined with an analytical model. A novel technique for controlling slip between substreams is introduced. Code partitioning issues of instruction count balancing and residence time balancing, important to any split-stream scheme, are discussed. Simulation shows DS achieves performance comparable to an aggressive superscalar, but with potentially less complex hardware and faster clock rate.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124321680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Pipeline gating: speculation control for energy reduction 管道门控:降低能耗的投机控制
Srilatha Manne, A. Klauser, D. Grunwald
{"title":"Pipeline gating: speculation control for energy reduction","authors":"Srilatha Manne, A. Klauser, D. Grunwald","doi":"10.1109/ISCA.1998.694769","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694769","url":null,"abstract":"Branch prediction has enabled microprocessors to increase instruction level parallelism (ILP) by allowing programs to speculatively execute beyond control boundaries. Although speculative execution is essential for increasing the instructions per cycle (IPC), it does come at a cost. A large amount of unnecessary work results from wrong-path instructions entering the pipeline due to branch misprediction. Results generated with the SimpleScalar tool set using a 4-way issue pipeline and various branch predictors show an instruction overhead of 16% to 105% for event instruction committed. The instruction overhead will increase in the future as processors use more aggressive speculation and wider issue widths. In this paper we present an innovative method for power reduction ,which, unlike previous work that sacrificed flexibility or performance reduces power in high-performance microprocessors without impacting performance. In particular we introduce a hardware mechanism called pipeline gating to control rampant speculation in the pipeline. We present inexpensive mechanisms for determining when a branch is likely to mispredict, and for stopping wrong-path instructions from entering the pipeline. Results show up to a 38% reduction in wrong-path instructions with a negligible performance loss (/spl ap/1%). Best of all, even in programs with a high branch prediction accuracy, performance does not noticeable degrade. Our analysis indicates that there is little risk in implementing this method in existing processors since it does not impact performance and can benefit energy reduction.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121615707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 479
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信