Memory System Performance最新文献

On the importance of optimizing the configuration of stream prefetchers 优化流预取器配置的重要性

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111591

I. Ganusov, Martin Burtscher

引用次数: 14

Gated memory control for memory monitoring, leak detection and garbage collection 门控内存控制内存监控，泄漏检测和垃圾收集

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111593

C. Ding, Chengliang Zhang, Xipeng Shen, M. Ogihara

引用次数: 16

Recursive data structure profiling 递归数据结构分析

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111585

Easwaran Raman, David I. August

{"title":"Recursive data structure profiling","authors":"Easwaran Raman, David I. August","doi":"10.1145/1111583.1111585","DOIUrl":"https://doi.org/10.1145/1111583.1111585","url":null,"abstract":"As the processor-memory performance gap increases, so does the need for aggressive data structure optimizations to reduce memory access latencies. Such optimizations require a better understanding of the memory behavior of programs. We propose a profiling technique called Recursive Data Structure Profiling to help better understand the memory access behavior of programs that use recursive data structures (RDS) such as lists, trees, etc. An RDS profile captures the runtime behavior of the individual instances of recursive data structures. RDS profiling differs from other memory profiling techniques in its ability to aggregate information pertaining to an entire data structure instance, rather than merely capturing the behavior of individual loads and stores, thereby giving a more global view of a program's memory accesses.This paper describes a method for collecting RDS profile without requiring any high-level program representation or type information. RDS profiling achieves this with manageable space and time overhead on a mixture of pointer intensive benchmarks from the SPEC, Olden and other benchmark suites. To illustrate the potential of the RDS profile in providing a better understanding of memory accesses, we introduce a metric to quantify the notion of stability of an RDS instance. A stable RDS instance is one that undergoes very few changes to its structure between its initial creation and final destruction, making it an attractive candidate to certain data structure optimizations.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"175 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133391390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

A locality-improving dynamic memory allocator 一个改进位置的动态内存分配器

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111594

Yi Feng, E. Berger

引用次数: 57

Performance characteristics of MAUI: an intelligent memory system architecture MAUI的性能特点:一种智能内存系统架构

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111590

J. Teller, C. B. Silio, B. Jacob

引用次数: 5

Application analysis using memory pressure 使用内存压力进行应用程序分析

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111586

K. Sudeep, A. Gheith

{"title":"Application analysis using memory pressure","authors":"K. Sudeep, A. Gheith","doi":"10.1145/1111583.1111586","DOIUrl":"https://doi.org/10.1145/1111583.1111586","url":null,"abstract":"As the speeds of microprocessors continue to follow Moore's law, memory speeds keep lagging farther behind so as to make the \"memory wall\" more and more distinct. In order for a processor architect to be able to evaluate the right micro-architectural features for the design, a study of the memory behavior of the applications becomes essential. In this paper we present a new metric termed \"memory pressure\" that can be used to analyze the application's behavior and quantify the demand an application places on the memory subsystem. Memory pressure is characterized by four metrics: (1) value-computation-to-use delay, (2)condition-resolution-to-use delay, (3) address-computation-to-use delay, and (4) value-load-to-use delay. It acts as an indicator of the opportunity that caching, prefetching, speculative loads or other DRAM latency hiding techniques can provide to improve the performance of the application. We have analyzed a few synthetic benchmarks as well as a few scientific applications and have been able to identify the benefit of caches and prefetch techniques for these benchmarks. As we demonstrate in this paper, quantifying the memory pressure not only provides insight into which architectural features a designer should evaluate for optimal performance, but also provides tangible hints to the software designer to make changes to the application -- algorithmic and structural -- to improve the performance.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126060952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Impact of modern memory subsystems on cache optimizations for stencil computations 现代内存子系统对模板计算缓存优化的影响

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111589

S. Kamil, P. Husbands, L. Oliker, J. Shalf, K. Yelick

引用次数: 117

Transparent pointer compression for linked data structures 链接数据结构的透明指针压缩

Memory System Performance Pub Date : 2005-06-12 DOI: 10.1145/1111583.1111587

Chris Lattner, Vikram S. Adve

引用次数: 37

Improving trace cache hit rates using the sliding window fill mechanism and fill select table 使用滑动窗口填充机制和填充选择表提高跟踪缓存命中率

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065902

M. Shaaban, Edward Mulrane

引用次数: 0

Automatic blocking of QR and LU factorizations for locality 自动阻塞QR和LU分解的局部性

Memory System Performance Pub Date : 2004-06-08 DOI: 10.1145/1065895.1065898

Qing Yi, K. Kennedy, Haihang You, Keith Seymour, J. Dongarra

{"title":"Automatic blocking of QR and LU factorizations for locality","authors":"Qing Yi, K. Kennedy, Haihang You, Keith Seymour, J. Dongarra","doi":"10.1145/1065895.1065898","DOIUrl":"https://doi.org/10.1145/1065895.1065898","url":null,"abstract":"QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, more benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131669264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20