Memory System Performance最新文献

筛选
英文 中文
The cache behaviour of large lazy functional programs on stock hardware 大型惰性函数程序在现有硬件上的缓存行为
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773044
N. Nethercote, A. Mycroft
{"title":"The cache behaviour of large lazy functional programs on stock hardware","authors":"N. Nethercote, A. Mycroft","doi":"10.1145/773146.773044","DOIUrl":"https://doi.org/10.1145/773146.773044","url":null,"abstract":"Lazy functional programs behave differently from imperative programs and these differences extend to cache behaviour. We use hardware counters and a simple yet accurate execution cost model to analyse some large Haskell programs on the x86 architecture. The programs do not interact well with modern processors---L2 cache data miss stalls and branch misprediction stalls account for up to 60% and 32% of execution time respectively. Moreover, the program code exhibits little exploitable instruction-level parallelism.We then use simulation to pinpoint cache misses at the instruction level. With this information we apply prefetching to minimise the cost of write misses, speeding up Haskell programs by up to 22%. We conclude with more ideas for changing the Glasgow Haskell Compiler and its garbage collector to improve the cache performance of large programs.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130133010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Older-first garbage collection in practice: evaluation in a Java Virtual Machine 实践中的老优先垃圾收集:Java虚拟机中的求值
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773042
D. Stefanovic, Matthew Hertz, S. Blackburn, K. McKinley, J. E. B. Moss
{"title":"Older-first garbage collection in practice: evaluation in a Java Virtual Machine","authors":"D. Stefanovic, Matthew Hertz, S. Blackburn, K. McKinley, J. E. B. Moss","doi":"10.1145/773146.773042","DOIUrl":"https://doi.org/10.1145/773146.773042","url":null,"abstract":"Until recently, the best performing copying garbage collectors used a generational policy which repeatedly collects the very youngest objects, copies any survivors to an older space, and then infrequently collects the older space. A previous study that used garbage-collection simulation pointed to potential improvements by using an Older-First copying garbage collection algorithm. The Older-First algorithm sweeps a fixed-sized window through the heap from older to younger objects, and avoids copying the very youngest objects which have not yet had sufficient time to die. We describe and examine here an implementation of the Older-First algorithm in the Jikes RVM for Java. This investigation shows that Older-First can perform as well as the simulation results suggested, and greatly improves total program performance when compared to using a fixed-size nursery generational collector. We further compare Older-First to a flexible-size nursery generational collector in which the nursery occupies all of the heap that does not contain older objects. In these comparisons, the flexible-nursery collector is occasionally the better of the two, but on average the Older-First collector performs the best.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132932155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
A proposal for a new hardware cache monitoring architecture 提出了一种新的硬件缓存监控体系结构
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773047
M. Schulz, J. Tao, Jürgen Jeitner, Wolfgang Karl
{"title":"A proposal for a new hardware cache monitoring architecture","authors":"M. Schulz, J. Tao, Jürgen Jeitner, Wolfgang Karl","doi":"10.1145/773146.773047","DOIUrl":"https://doi.org/10.1145/773146.773047","url":null,"abstract":"The analysis of the memory access behavior of applications, an essential step for a successful cache optimization, is a complex task. It needs to be supported with appropriate tools and monitoring facilities. Currently, however, users can only rely on either simulation based approaches, which deliver a large degree of detail but are restricted in their applicability, or on hardware counters embedded into processors, which allow to keep track of very few, mostly global events and hence only provide limited data.In this work a proposal for novel hardware monitoring facility is presented which exhibits both the details of traditional simulations and the low--overhead of hardware counters. Like the latter approach, it is also targeted towards an implementation within the processor for a direct and non--intrusive access to caches and memory busses. Unlike traditional counters, however, it delivers a detailed picture of the complete memory access behavior of applications. This is achieved by generating so--called memory access histograms, which show access frequencies in relation to the applications address space. Such spatial memory access information can then be used for efficient program optimization by focusing on the code and data segments which were found to exhibit a poor cache behavior.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127792930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An efficient static analysis algorithm to detect redundant memory operations 一个有效的静态分析算法来检测冗余内存操作
Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773049
K. Cooper, Li Xu
{"title":"An efficient static analysis algorithm to detect redundant memory operations","authors":"K. Cooper, Li Xu","doi":"10.1145/773146.773049","DOIUrl":"https://doi.org/10.1145/773146.773049","url":null,"abstract":"As memory system performance becomes an increasingly dominant factor in overall system performance, it is important to optimize programs for memory related operations. This paper concerns static analysis to detect redundant memory operations and enable other compiler transformations to remove such redundant operations.We present an extended global value numbering algorithm to detect redundant memory operations. The key of the new algorithm is a novel SSA-based representation for memory state which allows accurate reasoning about memory state. Using this representation, the algorithm can trace values through memory operations to detect equivalence in the same way that it traces them through register-based scalar operations. Thus it discovers both redundancy involving scalar values and redundancy involving memory operations. The redundancy relation detected by the algorithm can then be used by traditional redundancy elimination transformations to remove those redundant operations.Experiments using a suite of realistic applications demonstrate the algorithm is powerful and fast. In practice, it has essentially linear time complexity.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129428715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
The performance advantage of applying compression to the memory system 在内存系统中应用压缩技术的性能优势
Memory System Performance Pub Date : 2002-06-16 DOI: 10.1145/773146.773048
N. Mahapatra, Jiangjiang Liu, Krishnan Sundaresan
{"title":"The performance advantage of applying compression to the memory system","authors":"N. Mahapatra, Jiangjiang Liu, Krishnan Sundaresan","doi":"10.1145/773146.773048","DOIUrl":"https://doi.org/10.1145/773146.773048","url":null,"abstract":"The memory system stores information comprising primarily instructions and data and secondarily address information, such as cache tag fields. It interacts with the processor by supporting related traffic (again comprising addresses, instructions, and data). Continuing exponential growth in processor performance, combined with technology, architecture, and application trends, place enormous demands on the memory system to permit this information storage and exchange at a high-enough performance (i.e., to provide low latency and high bandwidth access to large amounts of information). This paper comprehensively analyzes the redundancy in the information (addresses, instructions, and data) stored and exchanged between the processor and the memory system and evaluates the potential of compression in improving performance of the memory system. Analysis of traces obtained with Sun Microsystems' Shade simulator simulating SPARC executables of nine integer and six floating-point programs in the SPEC CPU2000 benchmark suite yield impressive results. Well-designed compression schemes may provide benefits in performance that far outweigh the extra time and logic for compression and decompression. This will be more so in the future since the speed and size of logic (which will be used to perform compression/decompression) are improving and are projected to improve at a much higher rate compared to those of interconnect (which will be used to communicate the information), both on-chip and off-chip.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127227880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信