IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004最新文献_第2页

Performance evaluation of exclusive cache hierarchies 独占缓存层次结构的性能评估

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291359

Ying Zheng, B. Davis, M. Jordan

{"title":"Performance evaluation of exclusive cache hierarchies","authors":"Ying Zheng, B. Davis, M. Jordan","doi":"10.1109/ISPASS.2004.1291359","DOIUrl":"https://doi.org/10.1109/ISPASS.2004.1291359","url":null,"abstract":"Memory hierarchy performance, specifically cache memory capacity, is a constraining factor in the performance of modern computers. This paper presents the results of two-level cache memory simulations and examines the impact of exclusive caching on system performance. Exclusive caching enables higher capacity with the same cache area by eliminating redundant copies. The experiments presented compare an exclusive cache hierarchy with an inclusive cache hierarchy utilizing similar L1 and L2 parameters. Experiments indicate that significant performance advantages can be gained for some benchmark through the use of an exclusive organization. The performance differences are illustrated using the L2 cache misses and execution time metrics. The most significant improvement shown is a 16% reduction in execution time, with an average reduction of 8% for the smallest cache configuration tested. With equal size victim buffer and victim cache for exclusive and inclusive cache hierarchies respectively, some benchmarks show increased execution time for exclusive caches because a victim cache can reduce conflict misses significantly while a victim buffer can introduce worst-case penalties. Considering the inconsistent performance improvement, the increased complexity of an exclusive cache hierarchy needs to be justified based upon the specifics of the application and system.","PeriodicalId":188291,"journal":{"name":"IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116059865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Sockets Direct Protocol over InfiniBand in clusters: is it beneficial? 套接字直接协议在InfiniBand集群:它是有益的?

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291353

P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, Jiesheng Wu, D. Panda

{"title":"Sockets Direct Protocol over InfiniBand in clusters: is it beneficial?","authors":"P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, Jiesheng Wu, D. Panda","doi":"10.1109/ISPASS.2004.1291353","DOIUrl":"https://doi.org/10.1109/ISPASS.2004.1291353","url":null,"abstract":"The Sockets Direct Protocol (SDP) had been proposed recently in order to enable sockets based applications to take advantage of the enhanced features provided by InfiniBand architecture. In this paper, we study the benefits and limitations of an implementation of SDP. We first analyze the performance of SDP based on a detailed suite of micro-benchmarks. Next, we evaluate it on two different real application domains: (1) A multitier data-center environment and (2) A Parallel Virtual File System (PVFS). Our micro-benchmark results show that SDP is able to provide up to 2.7 times better bandwidth as compared to the native sockets implementation over InfiniBand (IPoIB) and significantly better latency for large message sizes. Our experimental results also show that SDP is able to achieve a considerably higher performance (improvement of up to 2.4 times) as compared to IPoIB in the PVFS environment. In the data-center environment, SDP outperforms IPoIB for large file transfers inspite of currently being limited by a high connection setup time. However, this limitation is entirely implementation specific and as the InfiniBand software and hardware products are rapidly maturing, we expect this limitation to be overcome soon. Based on this, we have shown that the projected performance for SDP, without the connection setup time, can outperform IPoIB for small message transfers as well.","PeriodicalId":188291,"journal":{"name":"IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115475915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Effectiveness of simple memory models for performance prediction 简单内存模型对性能预测的有效性

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291361

I. Tuduce, T. Gross

{"title":"Effectiveness of simple memory models for performance prediction","authors":"I. Tuduce, T. Gross","doi":"10.1109/ISPASS.2004.1291361","DOIUrl":"https://doi.org/10.1109/ISPASS.2004.1291361","url":null,"abstract":"Many situations call for an estimation of the execution time of applications, e.g., during design or evaluation of computer systems. In this paper we focus on large applications where the execution times heavily depend on the performance of the memory system. Since such applications are computationally expensive, direct simulation is not an option and an analytical model is called for. This paper addresses this problem by developing and evaluating two simple analytical models. These models focus on an application's interaction with the memory system. Applications are characterized by their memory access types. A regular application has continuous and stride memory accesses. An irregular application has three memory access types: continuous accesses, accesses within the same L1/L2 cache line, and random accesses. The analytical models are combined with results from micro-benchmarking or with appropriate performance estimates of memory accesses to predict application performance, either on real or future machines. We apply these models to executions of CHARMM (Chemistry at HARvard Molecular Mechanics) - a scientific application written in FORTRAN, SMV (Symbolic Model Verifier) - coded in C++. For all three applications, the approaches described here produce results with 5% accuracy on average (compared to the effective run-time measured on a real SPARC system).","PeriodicalId":188291,"journal":{"name":"IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004","volume":"160 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132564306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Dynamically reducing pressure on the physical register file through simple register sharing 通过简单的寄存器共享动态减少对物理寄存器文件的压力

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291358

Liem Tran, Nicholas Nelson, Fung Ngai, S. Dropsho, Michael C. Huang

{"title":"Dynamically reducing pressure on the physical register file through simple register sharing","authors":"Liem Tran, Nicholas Nelson, Fung Ngai, S. Dropsho, Michael C. Huang","doi":"10.1109/ISPASS.2004.1291358","DOIUrl":"https://doi.org/10.1109/ISPASS.2004.1291358","url":null,"abstract":"Using register renaming and physical registers, modern microprocessors eliminate false data dependences from reuse of the instruction set defined registers (logical registers). High performance processors that have longer pipelines and a greater capacity to exploit instruction-level parallelism have more instructions in-flight and require more physical registers. Simultaneous multithreading architectures further exacerbate this register pressure. This paper evaluates two register sharing techniques for reducing register usage. The first technique dynamically combines physical registers having the same value the second technique combines the demand of several instructions updating the same logical register and share physical register storage among them. While similar techniques have been proposed previously, an important contribution of this paper is to exploit only special cases that provide most of the benefits of more general solutions but at a very low hardware complexity. Despite the simplicity, our design reduces the required number of physical registers by more than 10% on some applications, and provides almost half of the total benefits of an aggressive (complex) scheme. More importantly, we show the simpler design to reduce register pressure has significant performance effects in a simultaneous multithreaded (SMT) architecture where register availability can be a bottleneck. Our results show an average of 25.6% performance improvement for an SMT architecture with 160 registers or, equivalently, similar performance as an SMT with 200 registers (25% more) but no register sharing.","PeriodicalId":188291,"journal":{"name":"IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131318522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

Characterization of the data access behavior for TPC-C traces TPC-C走线的数据访问行为表征

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291363

R. Bonilla-Lucas, P. Plachta, Aamer Sachedina, Daniel Jiménez-González, C. Zuzarte, J. Larriba-Pey

引用次数: 5

The future of simulation: A field of dreams 模拟的未来:一个梦想的领域

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291369

B. Calder, D. Citron, Y. Patt, James E. Smith

引用次数: 60

Architectures and compilers for multimedia 多媒体的体系结构和编译器

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291371

W. Wolf

引用次数: 0

Structures for phase classification 相分类结构

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291356

Jeremy Lau, Stefan Schoenmackers, B. Calder

引用次数: 120

The BlueGene/L pseudo cycle-accurate simulator BlueGene/L伪周期精确模拟器

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291354

Leonardo R. Bachega, J. Brunheroto, L. D. Rose, Pedro Mindlin, J. Moreira

{"title":"The BlueGene/L pseudo cycle-accurate simulator","authors":"Leonardo R. Bachega, J. Brunheroto, L. D. Rose, Pedro Mindlin, J. Moreira","doi":"10.1109/ISPASS.2004.1291354","DOIUrl":"https://doi.org/10.1109/ISPASS.2004.1291354","url":null,"abstract":"The design and development of a new computer system is a lengthy process, with a considerable amount of time elapsed between the beginning of development and first hardware availability. Hence, fast and reasonably accurate simulation of processor architecture has become critical as an enabling mechanism for software engineers to develop and tune system software and applications. In this paper, we present the time-stamped timing model extensions to the BlueGene/L functional simulator. These extensions were implemented to create a pseudo cycle-accurate simulator capable of providing tracing capabilities for detection of bottlenecks and for performance tuning of applications, before the actual hardware became available. Our validation tests, using the DAXPY kernel and the serial version of the NAS benchmarks, show that our pseudo cycle-accurate simulator provides timing information within 15% of the times measured using the actual BlueGene/L hardware. In addition, we present a couple of case studies, which describes how this simulator can be used for identification of performance bottlenecks and for application tuning.","PeriodicalId":188291,"journal":{"name":"IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133838133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Compiler-directed physical address generation for reducing dTLB power 编译器导向的物理地址生成，以减少dTLB功率

IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004 Pub Date : 2004-03-10 DOI: 10.1109/ISPASS.2004.1291368

I. Kadayif, Partho Nath, M. Kandemir, A. Sivasubramaniam

{"title":"Compiler-directed physical address generation for reducing dTLB power","authors":"I. Kadayif, Partho Nath, M. Kandemir, A. Sivasubramaniam","doi":"10.1109/ISPASS.2004.1291368","DOIUrl":"https://doi.org/10.1109/ISPASS.2004.1291368","url":null,"abstract":"Address translation using the Translation Lookaside Buffer (TLB) consumes as much as 16% of the chip power on some processors because of its high associativity and access frequency. While prior work has looked into optimizing this structure at the circuit and architectural levels, this paper takes a different approach of optimizing its power by reducing the number of data TLB (dTLB) lookups for data references. The main idea is to keep translations in a set of translation registers, and intelligently use them in software to directly generate the physical addresses without going through the dTLB. The software has to work within the confines of the translation registers provided by the hardware, and has to maximize the reuse of such translations to be effective. We propose strategies and code transformations for achieving this in array-based and pointer-based codes, looking to optimize data accesses. Results with a suite of Spec95 array-based and pointer-based codes show dTLB energy savings of up to 73% and 88%, respectively, compared to directly using the dTLB for all references. Despite the small increase in instructions executed with our mechanisms, the approach can in fact provide performance benefits in certain cases.","PeriodicalId":188291,"journal":{"name":"IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004","volume":"397 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132167124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17