Memory System Performance最新文献

筛选
英文 中文
An empirical performance analysis of commodity memories in commodity servers 商品服务器中商品存储器的实证性能分析
Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065903
D. Kerbyson, M. Lang, G. Patino, Hossein Amidi
{"title":"An empirical performance analysis of commodity memories in commodity servers","authors":"D. Kerbyson, M. Lang, G. Patino, Hossein Amidi","doi":"10.1145/1065895.1065903","DOIUrl":"https://doi.org/10.1145/1065895.1065903","url":null,"abstract":"This work details a performance study of six different types of commodity memories in two commodity server nodes. A number of micro-benchmarks are used that measure low-level performance characteristics, as well as two applications representative of the ASC workload. The memories vary both in terms of performance, including latency and bandwidths, and in terms of their physical properties and manufacturer. The two server nodes analyzed were an Itanium-II Madison based system, and a Xeon based system. All memories can be used within both of these processing nodes. This allows the performance of the memories to be directly examined while keeping all other factors within a node the same (processor, motherboard, operating system etc.). The results of this study show that there can be a significant difference in application performance depending on the actual memory used - by as much as 20%. The achieved performance is a result of the integration of the memory into the node as well as how the applications actually utilize it.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116910987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reuse-distance-based miss-rate prediction on a per instruction basis 基于每条指令的基于重用距离的失误率预测
Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065906
Changpeng Fang, S. Carr, Soner Önder, Zhenlin Wang
{"title":"Reuse-distance-based miss-rate prediction on a per instruction basis","authors":"Changpeng Fang, S. Carr, Soner Önder, Zhenlin Wang","doi":"10.1145/1065895.1065906","DOIUrl":"https://doi.org/10.1145/1065895.1065906","url":null,"abstract":"Feedback-directed optimization has become an increasingly important tool in designing and building optimizing compilers. Recently, reuse-distance analysis has shown much promise in predicting the memory behavior of programs over a wide range of data sizes. Reuse-distance analysis predicts program locality by experimentally determining locality properties as a function of the data size of a program, allowing accurate locality analysis when the program's data size changes.Prior work has established the effectiveness of reuse distance analysis in predicting whole-program locality and miss rates. In this paper, we show that reuse distance can also effectively predict locality and miss rates on a per instruction basis. Rather than predict locality by analyzing reuse distances for memory addresses alone, we relate those addresses to particular static memory operations and predict the locality of each instruction.Our experiments show that using reuse distance without cache simulation to predict miss rates of instructions is superior to using cache simulations on a single representative data set to predict miss rates on various data sizes. In addition, our analysis allows us to identify the critical memory operations that are likely to produce a significant number of cache misses for a given data size. With this information, compilers can target cache optimization specifically to the instructions that can benefit from such optimizations most.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129280134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Metrics and models for reordering transformations 用于重新排序转换的度量和模型
Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065899
M. Strout, P. Hovland
{"title":"Metrics and models for reordering transformations","authors":"M. Strout, P. Hovland","doi":"10.1145/1065895.1065899","DOIUrl":"https://doi.org/10.1145/1065895.1065899","url":null,"abstract":"Irregular applications frequently exhibit poor performance on contemporary computer architectures, in large part because of their inefficient use of the memory hierarchy. Run-time data, and iteration-reordering transformations have been shown to improve the locality and therefore the performance of irregular benchmarks. This paper describes models for determining which combination of run-time data- and iteration-reordering heuristics will result in the best performance for a given dataset. We propose that the data- and iteration-reordering transformations be viewed as approximating minimal linear arrangements on two separate hypergraphs: a spatial locality hypergraph and a temporal locality hypergraph. Our results measure the efficacy of locality metrics based on these hypergraphs in guiding the selection of data-and iteration-reordering heuristics. We also introduce new iteration- and data-reordering heuristics based on the hypergraph models that result in better performance than do previous heuristics.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127430041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Polar opposites: next generation languages and architectures 两极对立:下一代语言和体系结构
Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065900
K. McKinley
{"title":"Polar opposites: next generation languages and architectures","authors":"K. McKinley","doi":"10.1145/1065895.1065900","DOIUrl":"https://doi.org/10.1145/1065895.1065900","url":null,"abstract":"Future hardware technology is on a collision course with modern programming languages. Adoption of programming languages is rare and slow, but programmers are now embracing high-level object-oriented languages such as Java and C# due to their software engineering benefits which include (1) fast development through code reuse and garbage collection; (2) ease of maintenance through encapsulation and object-orientation; (3) reduced errors through type safety, pointer disciplines, and garbage collection; and (4) portability. These programs use small methods, dynamic class binding, heavy memory allocation, short-lived objects, and pointer data structures, and thus obscure parallelism, locality, and control flow, in direct conflict with hardware trends.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123361270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Instruction combining for coalescing memory accesses using global code motion 使用全局代码运动合并内存访问的指令组合
Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065897
M. Kawahito, H. Komatsu, T. Nakatani
{"title":"Instruction combining for coalescing memory accesses using global code motion","authors":"M. Kawahito, H. Komatsu, T. Nakatani","doi":"10.1145/1065895.1065897","DOIUrl":"https://doi.org/10.1145/1065895.1065897","url":null,"abstract":"Instruction combining is an optimization to replace a sequence of instructions with a more efficient instruction yielding the same result in a fewer machine cycles. When we use it for coalescing memory accesses, we can reduce the memory traffic by combining narrow memory references with contiguous addresses into a wider reference for taking advantage of a wide-bus architecture. Coalescing memory accesses can improve performance for two reasons: one by reducing the additional cycles required for moving data from caches to registers and the other by reducing the stall cycles caused by multiple outstanding memory access requests. Previous approaches for memory access coalescing focus only on array access instructions related to loop induction variables, and thus they miss many other opportunities. In this paper, we propose a new algorithm for instruction combining by applying global code motion to wider regions of the given program in search of more potential candidates. We implemented two optimizations for coalescing memory accesses, one combining two 32-bit integer loads and the other combining two single-precision floating-point loads, using our algorithm in the IBM Java™ JIT compiler for IA-64, and evaluated them by measuring the SPECjvm98 benchmark suite. In our experiment, we can improve the maximum performance by 5.5% with little additional compilation time overhead. Moreover, when we replace every declaration of double for an instance variable with float, we can improve the performance by 7.3% for the MolDyn benchmark in the JavaGrande benchmark suite. Our approach can be applied to a variety of architectures and to programming languages besides Java.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129484455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Programmer specified pointer independence 程序员指定的指针独立性
Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065905
D. Koes, M. Budiu, Girish Venkataramani
{"title":"Programmer specified pointer independence","authors":"D. Koes, M. Budiu, Girish Venkataramani","doi":"10.1145/1065895.1065905","DOIUrl":"https://doi.org/10.1145/1065895.1065905","url":null,"abstract":"Good alias analysis is essential in order to achieve high performance on modern processors, yet precise interprocedural analysis does not scale well. We present a source code annotation, #pragma independent, which provides precise pointer aliasing information to the compiler, and describe a tool which highlights the most important and most likely correct locations at which a programmer should insert these annotations. Using this tool we perform a limit study on the effectiveness of pointer independence in improving program performance through improved compilation.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121751144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
From simulation to practice: cache performance study of a Prolog system 从仿真到实践:Prolog系统的缓存性能研究
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773045
Ricardo Lopes, L. Castro, V. S. Costa
{"title":"From simulation to practice: cache performance study of a Prolog system","authors":"Ricardo Lopes, L. Castro, V. S. Costa","doi":"10.1145/773146.773045","DOIUrl":"https://doi.org/10.1145/773146.773045","url":null,"abstract":"Progress in Prolog applications requires ever better performance and scalability from Prolog implementation technology. Most modern Prolog systems are emulator-based. Best performance thus requires both good emulator design and good memory performance. Indeed, Prolog applications can often spend hundreds of megabytes of data, but there is little work on understanding and quantifying the interactions between Prolog programs and the memory architecture of modern computers.In a previous study of Prolog systems we have shown through simulation that Prolog applications usually, but not always, have good locality, both for deterministic and non-deterministic applications. We also showed that performance may strongly depend on garbage collection and on database operations. Our analysis left two questions unanswered: how well do our simulated results holds on actual hardware, and how much did our results depend on a specific configuration? In this work we use several simulation parameters and profiling counters to improve understanding of Prolog applications. We believe that our analysis is of interest to any system implementor who wants to understand his or her own system's memory performance.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130656837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Calculating stack distances efficiently 有效地计算堆栈距离
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773043
G. Almási, Calin Cascaval, D. Padua
{"title":"Calculating stack distances efficiently","authors":"G. Almási, Calin Cascaval, D. Padua","doi":"10.1145/773146.773043","DOIUrl":"https://doi.org/10.1145/773146.773043","url":null,"abstract":"This paper1 describes our experience using the stack processing algorithm [6] for estimating the number of cache misses in scientific programs. By using a new data structure and various optimization techniques we obtain instrumented run-times within 50 to 100 times the original optimized run-times of our benchmarks.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121721659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 132
Automatic pool allocation for disjoint data structures 为不相交的数据结构自动分配池
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773041
Chris Lattner, Vikram S. Adve
{"title":"Automatic pool allocation for disjoint data structures","authors":"Chris Lattner, Vikram S. Adve","doi":"10.1145/773146.773041","DOIUrl":"https://doi.org/10.1145/773146.773041","url":null,"abstract":"This paper presents an analysis technique and a novel program transformation that can enable powerful optimizations for entire linked data structures. The fully automatic transformation converts ordinary programs to use pool (aka region) allocation for heap-based data structures. The transformation relies on an efficient link-time interprocedural analysis to identify disjoint data structures in the program, to check whether these data structures are accessed in a type-safe manner, and to construct a Disjoint Data Structure Graph that describes the connectivity pattern within such structures. We present preliminary experimental results showing that the data structure analysis and pool allocation are effective for a set of pointer intensive programs in the Olden benchmark suite. To illustrate the optimizations that can be enabled by these techniques, we describe a novel pointer compression transformation and briefly discuss several other optimization possibilities for linked data structures.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124412545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Compiler-directed run-time monitoring of program data access 对程序数据访问的编译器导向的运行时监视
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773040
C. Ding, Y. Zhong
{"title":"Compiler-directed run-time monitoring of program data access","authors":"C. Ding, Y. Zhong","doi":"10.1145/773146.773040","DOIUrl":"https://doi.org/10.1145/773146.773040","url":null,"abstract":"Accurate run-time analysis has been expensive for complex programs, in part because most methods perform on all a data. Some applications require only partial reorganization. An example of this is off-loading infrequently used data from a mobile device. Complete monitoring is not necessary because not all accesses can reach the displaced data. To support partial monitoring, this paper presents a framework that includes a source-to-source C compiler and a run-time monitor. The compiler inserts run-time calls, which invoke the monitor during execution. To be selective, the compiler needs to identify relevant data and their access. It needs to analyze both the content and the location of monitored data. To reduce run-time overhead, the system uses a source-level interface, where the compiler transfers additional program information to reduce the workload of the monitor. The paper describes an implementation for general C programs. It evaluates different levels of data monitoring and their application on an SGI workstation and an Intel PC.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126314195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信