The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.最新文献

Beyond performance: some (other) challenges for systems design 性能之外:系统设计的一些(其他)挑战

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183531

E. Kronstadt

引用次数: 0

Microarchitecture and performance analysis of a SPARC-V9 microprocessor for enterprise server systems 用于企业服务器系统的SPARC-V9微处理器的微体系结构和性能分析

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183533

M. Sakamoto, A. Katsuno, Aiichiro Inoue, T. Asakawa, H. Ueno, K. Morita, Yasunori Kimura

{"title":"Microarchitecture and performance analysis of a SPARC-V9 microprocessor for enterprise server systems","authors":"M. Sakamoto, A. Katsuno, Aiichiro Inoue, T. Asakawa, H. Ueno, K. Morita, Yasunori Kimura","doi":"10.1109/HPCA.2003.1183533","DOIUrl":"https://doi.org/10.1109/HPCA.2003.1183533","url":null,"abstract":"We developed a 1.3-GHz SPARC-V9 processor: the SPARC64 V. This processor is designed to address requirements for enterprise servers and high-performance computing. Processing speed under multiuser interactive workloads is very sensitive to system balance because of the large number of memory requests included. From many years of experience with such workloads in mainframe system developments, we give importance to design a well-balanced communication structure. To accomplish this task, a system-level performance study must begin at an early please. Therefore we developed a performance model, which consists of a detailed processor model and detailed memory model, before hardware design was started. We updated it continuously. Once a logic simulator became available, we used it to verify the performance model for improving its accuracy. The model quite effectively enabled us to achieve performance goals and finish development quickly. This paper describes the SPARC64 V microarchitecture and performance analyses for hardware design.","PeriodicalId":150992,"journal":{"name":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127663918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Catching accurate profiles in hardware 在硬件中捕获准确的配置文件

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183545

S. Narayanasamy, T. Sherwood, S. Sair, B. Calder, G. Varghese

{"title":"Catching accurate profiles in hardware","authors":"S. Narayanasamy, T. Sherwood, S. Sair, B. Calder, G. Varghese","doi":"10.1109/HPCA.2003.1183545","DOIUrl":"https://doi.org/10.1109/HPCA.2003.1183545","url":null,"abstract":"Run-time optimization is one of the most important ways of getting performance out of modern processors. Techniques such as prefetching, trace caching, memory disambiguation etc., are all based upon the principle of observation followed by adaptation, and all make use of some sort of profile information gathered at run-time. Programs are very complex, and the real trick in generating useful run-time profiles is sifting through all the unimportant and infrequently occurring events to find those that are important enough to warrant optimization. In this paper, we present the multi-hash architecture to catch important events even in the presence of extensive noise. Multi-hash uses a small amount of area, between 7 to 16 Kilo-bytes, to accurately capture these important events in hardware, without requiring any software support. This is achieved using multiple hash tables for the filtering, and interval-based profiling to help identify how important an event is in relationship to all the other events. We evaluate our design for value and edge profiling, and show that over a set of benchmarks, we get an average error less than 1%.","PeriodicalId":150992,"journal":{"name":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131560385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Memory system behavior of Java-based middleware 基于java的中间件的内存系统行为

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183540

Martin Karlsson, Kevin E. Moore, Erik Hagersten, D. Wood

引用次数: 63

Power-aware control speculation through selective throttling 通过选择性节流进行功率感知控制推测

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183528

Juan L. Aragón, José González, Antonio González

{"title":"Power-aware control speculation through selective throttling","authors":"Juan L. Aragón, José González, Antonio González","doi":"10.1109/HPCA.2003.1183528","DOIUrl":"https://doi.org/10.1109/HPCA.2003.1183528","url":null,"abstract":"With the constant advances in technology that lead to the increasing of the transistor count and processor frequency, power dissipation is becoming one of the major issues in high-performance processors. These processors increase their clock frequency by lengthening the pipeline, which puts more pressure on the branch prediction engine since branches take longer to be resolved. Branch mispredictions are responsible for around 28% of the power dissipated by a typical processor due to the useless activities performed by instructions that are squashed. This work focuses on reducing the power dissipated by mis-speculated instructions. We propose selective throttling as an effective way of triggering different power-aware techniques (fetch throttling, decode throttling or disabling the selection logic). The particular set of techniques applied to each branch is dynamically chosen depending on the branch prediction confidence level. For branches with a low confidence on the prediction, the most aggressive throttling mechanism is used whereas high confidence branch predictions trigger the least aggressive techniques. Results show that combining fetch bandwidth reduction along with select logic disabling provides the best performance both in terms of energy reduction and energy-delay improvement (14% and 9% respectively for 14 stages, and 17% and 12% respectively for 28 stages).","PeriodicalId":150992,"journal":{"name":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128498355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Cost-sensitive cache replacement algorithms 代价敏感的缓存替换算法

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183550

Jaeheon Jeong, M. Dubois

{"title":"Cost-sensitive cache replacement algorithms","authors":"Jaeheon Jeong, M. Dubois","doi":"10.1109/HPCA.2003.1183550","DOIUrl":"https://doi.org/10.1109/HPCA.2003.1183550","url":null,"abstract":"Cache replacement algorithms originally developed in the context of simple uniprocessor systems aim to reduce the miss count. However, in modern systems, cache misses have different costs. The cost may be latency, penalty, power consumption, bandwidth consumption, or any other ad-hoc numerical property attached to a miss. In many practical situations, it is desirable to inject the cost of a miss into the replacement policy. In this paper, we propose several extensions of LRU which account for nonuniform miss costs. These LRU extensions have simple implementations, yet they are very effective in various situations. We first explore the simple case of two static miss costs using trace-driven simulations to understand when cost-sensitive replacements are effective. We show that very large improvements of the cost function are possible in many practical cases. As an example of their effectiveness, we apply the algorithms to the second-level cache of a multiprocessor with superscalar processors, using the miss latency as the cost function. By applying our simple replacement policies sensitive to the latency of misses we can improve the execution time of some parallel applications by up to 18%.","PeriodicalId":150992,"journal":{"name":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","volume":"476 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116234942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

Dynamic optimization of micro-operations 微操作动态优化

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183535

Brian Slechta, David Crowe, Brian Fahs, M. Fertig, Gregory A. Muthler, Justin Quek, Francesco Spadini, Sanjay J. Patel, S. Lumetta

{"title":"Dynamic optimization of micro-operations","authors":"Brian Slechta, David Crowe, Brian Fahs, M. Fertig, Gregory A. Muthler, Justin Quek, Francesco Spadini, Sanjay J. Patel, S. Lumetta","doi":"10.1109/HPCA.2003.1183535","DOIUrl":"https://doi.org/10.1109/HPCA.2003.1183535","url":null,"abstract":"Inherent within complex instruction set architectures such as /spl times/86 are inefficiencies that do not exist in a simpler ISA. Modern /spl times/86 implementations decode instructions into one or more micro-operations in order to deal with the complexity of the ISA. Since these micro-operations are not visible to the compiler the stream of micro-operations can contain redundancies even in statically optimized /spl times/86 code. Within a processor implementation, however barriers at the ISA level do not apply, and these redundancies can be removed by optimizing the micro-operation stream. In this paper we explore the opportunities to optimize code at the micro-operation granularity. We execute these micro-operation optimizations using the rePLay Framework as a microarchitectural substrate. Using a simple set of seven optimizations, including two that aggressively and speculatively attempt to remove redundant load instructions, we examine the effects of dynamic optimization of micro-operations using a trace-driven simulation environment. Simulation reveals that across a sampling of SPECint 2000 and real /spl times/86 applications, rePLay is able to reduce micro-operation count by 21% and, in particular load micro-operation count by 22%. These reductions correspond to a boost in observed instruction-level parallelism on an 8-wide optimizing rePLay processor by 17% over a non-optimizing configuration.","PeriodicalId":150992,"journal":{"name":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132000001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Front-end policies for improved issue efficiency in SMT processors 用于提高SMT处理器中问题效率的前端策略

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183522

A. El-Moursy, D. Albonesi

引用次数: 108

Dynamic data replication: an approach to providing fault-tolerant shared memory clusters 动态数据复制:一种提供容错共享内存集群的方法

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183538

Rosalia Christodoulopoulou, R. Azimi, A. Bilas

引用次数: 15

Variability in architectural simulations of multi-threaded workloads 多线程工作负载的体系结构模拟中的可变性

The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings. Pub Date : 2003-02-08 DOI: 10.1109/HPCA.2003.1183520

Alaa R. Alameldeen, D. Wood

{"title":"Variability in architectural simulations of multi-threaded workloads","authors":"Alaa R. Alameldeen, D. Wood","doi":"10.1109/HPCA.2003.1183520","DOIUrl":"https://doi.org/10.1109/HPCA.2003.1183520","url":null,"abstract":"Multi-threaded commercial workloads implement many important Internet services. Consequently, these workloads are increasingly used to evaluate the performance of uniprocessor and multiprocessor system designs. This paper identifies performance variability as a potentially major challenge for architectural simulation studies using these workloads. Variability refers to the differences between multiple estimates of a workload's performance. Time variability occurs when a workload exhibits different characteristics during different phases of a single run. Space variability occurs when small variations in timing cause runs starting from the same initial condition to follow widely different execution paths. Variability is a well-known phenomenon in real systems, but is nearly universally ignored in simulation experiments. In a central result of this paper we show that variability in multi-threaded commercial workloads can lead to incorrect architectural conclusions (e.g., 31% of the time in one experiment). We propose a methodology, based on multiple simulations and standard statistical techniques, to compensate for variability. Our methodology greatly reduces the probability of reaching incorrect conclusions, while enabling simulations to finish within reasonable time limits.","PeriodicalId":150992,"journal":{"name":"The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117028398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 274