Memory System Performance最新文献_第2页

An empirical performance analysis of commodity memories in commodity servers 商品服务器中商品存储器的实证性能分析

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065903

D. Kerbyson, M. Lang, G. Patino, Hossein Amidi

引用次数: 2

Reuse-distance-based miss-rate prediction on a per instruction basis 基于每条指令的基于重用距离的失误率预测

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065906

Changpeng Fang, S. Carr, Soner Önder, Zhenlin Wang

{"title":"Reuse-distance-based miss-rate prediction on a per instruction basis","authors":"Changpeng Fang, S. Carr, Soner Önder, Zhenlin Wang","doi":"10.1145/1065895.1065906","DOIUrl":"https://doi.org/10.1145/1065895.1065906","url":null,"abstract":"Feedback-directed optimization has become an increasingly important tool in designing and building optimizing compilers. Recently, reuse-distance analysis has shown much promise in predicting the memory behavior of programs over a wide range of data sizes. Reuse-distance analysis predicts program locality by experimentally determining locality properties as a function of the data size of a program, allowing accurate locality analysis when the program's data size changes.Prior work has established the effectiveness of reuse distance analysis in predicting whole-program locality and miss rates. In this paper, we show that reuse distance can also effectively predict locality and miss rates on a per instruction basis. Rather than predict locality by analyzing reuse distances for memory addresses alone, we relate those addresses to particular static memory operations and predict the locality of each instruction.Our experiments show that using reuse distance without cache simulation to predict miss rates of instructions is superior to using cache simulations on a single representative data set to predict miss rates on various data sizes. In addition, our analysis allows us to identify the critical memory operations that are likely to produce a significant number of cache misses for a given data size. With this information, compilers can target cache optimization specifically to the instructions that can benefit from such optimizations most.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129280134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 53

Metrics and models for reordering transformations 用于重新排序转换的度量和模型

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065899

M. Strout, P. Hovland

引用次数: 47

Polar opposites: next generation languages and architectures 两极对立:下一代语言和体系结构

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065900

K. McKinley

引用次数: 1

Instruction combining for coalescing memory accesses using global code motion 使用全局代码运动合并内存访问的指令组合

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065897

M. Kawahito, H. Komatsu, T. Nakatani

{"title":"Instruction combining for coalescing memory accesses using global code motion","authors":"M. Kawahito, H. Komatsu, T. Nakatani","doi":"10.1145/1065895.1065897","DOIUrl":"https://doi.org/10.1145/1065895.1065897","url":null,"abstract":"Instruction combining is an optimization to replace a sequence of instructions with a more efficient instruction yielding the same result in a fewer machine cycles. When we use it for coalescing memory accesses, we can reduce the memory traffic by combining narrow memory references with contiguous addresses into a wider reference for taking advantage of a wide-bus architecture. Coalescing memory accesses can improve performance for two reasons: one by reducing the additional cycles required for moving data from caches to registers and the other by reducing the stall cycles caused by multiple outstanding memory access requests. Previous approaches for memory access coalescing focus only on array access instructions related to loop induction variables, and thus they miss many other opportunities. In this paper, we propose a new algorithm for instruction combining by applying global code motion to wider regions of the given program in search of more potential candidates. We implemented two optimizations for coalescing memory accesses, one combining two 32-bit integer loads and the other combining two single-precision floating-point loads, using our algorithm in the IBM Java™ JIT compiler for IA-64, and evaluated them by measuring the SPECjvm98 benchmark suite. In our experiment, we can improve the maximum performance by 5.5% with little additional compilation time overhead. Moreover, when we replace every declaration of double for an instance variable with float, we can improve the performance by 7.3% for the MolDyn benchmark in the JavaGrande benchmark suite. Our approach can be applied to a variety of architectures and to programming languages besides Java.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129484455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Programmer specified pointer independence 程序员指定的指针独立性

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065905

D. Koes, M. Budiu, Girish Venkataramani

引用次数: 19

From simulation to practice: cache performance study of a Prolog system 从仿真到实践:Prolog系统的缓存性能研究

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773045

Ricardo Lopes, L. Castro, V. S. Costa

{"title":"From simulation to practice: cache performance study of a Prolog system","authors":"Ricardo Lopes, L. Castro, V. S. Costa","doi":"10.1145/773146.773045","DOIUrl":"https://doi.org/10.1145/773146.773045","url":null,"abstract":"Progress in Prolog applications requires ever better performance and scalability from Prolog implementation technology. Most modern Prolog systems are emulator-based. Best performance thus requires both good emulator design and good memory performance. Indeed, Prolog applications can often spend hundreds of megabytes of data, but there is little work on understanding and quantifying the interactions between Prolog programs and the memory architecture of modern computers.In a previous study of Prolog systems we have shown through simulation that Prolog applications usually, but not always, have good locality, both for deterministic and non-deterministic applications. We also showed that performance may strongly depend on garbage collection and on database operations. Our analysis left two questions unanswered: how well do our simulated results holds on actual hardware, and how much did our results depend on a specific configuration? In this work we use several simulation parameters and profiling counters to improve understanding of Prolog applications. We believe that our analysis is of interest to any system implementor who wants to understand his or her own system's memory performance.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130656837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Calculating stack distances efficiently 有效地计算堆栈距离

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773043

G. Almási, Calin Cascaval, D. Padua

引用次数: 132

Automatic pool allocation for disjoint data structures 为不相交的数据结构自动分配池

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773041

Chris Lattner, Vikram S. Adve

引用次数: 36

Compiler-directed run-time monitoring of program data access 对程序数据访问的编译器导向的运行时监视

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773040

C. Ding, Y. Zhong

引用次数: 15