Memory System Performance最新文献_第3页

The cache behaviour of large lazy functional programs on stock hardware 大型惰性函数程序在现有硬件上的缓存行为

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773044

N. Nethercote, A. Mycroft

引用次数: 23

Older-first garbage collection in practice: evaluation in a Java Virtual Machine 实践中的老优先垃圾收集:Java虚拟机中的求值

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773042

D. Stefanovic, Matthew Hertz, S. Blackburn, K. McKinley, J. E. B. Moss

{"title":"Older-first garbage collection in practice: evaluation in a Java Virtual Machine","authors":"D. Stefanovic, Matthew Hertz, S. Blackburn, K. McKinley, J. E. B. Moss","doi":"10.1145/773146.773042","DOIUrl":"https://doi.org/10.1145/773146.773042","url":null,"abstract":"Until recently, the best performing copying garbage collectors used a generational policy which repeatedly collects the very youngest objects, copies any survivors to an older space, and then infrequently collects the older space. A previous study that used garbage-collection simulation pointed to potential improvements by using an Older-First copying garbage collection algorithm. The Older-First algorithm sweeps a fixed-sized window through the heap from older to younger objects, and avoids copying the very youngest objects which have not yet had sufficient time to die. We describe and examine here an implementation of the Older-First algorithm in the Jikes RVM for Java. This investigation shows that Older-First can perform as well as the simulation results suggested, and greatly improves total program performance when compared to using a fixed-size nursery generational collector. We further compare Older-First to a flexible-size nursery generational collector in which the nursery occupies all of the heap that does not contain older objects. In these comparisons, the flexible-nursery collector is occasionally the better of the two, but on average the Older-First collector performs the best.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132932155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

A proposal for a new hardware cache monitoring architecture 提出了一种新的硬件缓存监控体系结构

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773047

M. Schulz, J. Tao, Jürgen Jeitner, Wolfgang Karl

{"title":"A proposal for a new hardware cache monitoring architecture","authors":"M. Schulz, J. Tao, Jürgen Jeitner, Wolfgang Karl","doi":"10.1145/773146.773047","DOIUrl":"https://doi.org/10.1145/773146.773047","url":null,"abstract":"The analysis of the memory access behavior of applications, an essential step for a successful cache optimization, is a complex task. It needs to be supported with appropriate tools and monitoring facilities. Currently, however, users can only rely on either simulation based approaches, which deliver a large degree of detail but are restricted in their applicability, or on hardware counters embedded into processors, which allow to keep track of very few, mostly global events and hence only provide limited data.In this work a proposal for novel hardware monitoring facility is presented which exhibits both the details of traditional simulations and the low--overhead of hardware counters. Like the latter approach, it is also targeted towards an implementation within the processor for a direct and non--intrusive access to caches and memory busses. Unlike traditional counters, however, it delivers a detailed picture of the complete memory access behavior of applications. This is achieved by generating so--called memory access histograms, which show access frequencies in relation to the applications address space. Such spatial memory access information can then be used for efficient program optimization by focusing on the code and data segments which were found to exhibit a poor cache behavior.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127792930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

An efficient static analysis algorithm to detect redundant memory operations 一个有效的静态分析算法来检测冗余内存操作

Memory System Performance Pub Date : 2003-02-15 DOI: 10.1145/773146.773049

K. Cooper, Li Xu

引用次数: 19

The performance advantage of applying compression to the memory system 在内存系统中应用压缩技术的性能优势

Memory System Performance Pub Date : 2002-06-16 DOI: 10.1145/773146.773048

N. Mahapatra, Jiangjiang Liu, Krishnan Sundaresan

{"title":"The performance advantage of applying compression to the memory system","authors":"N. Mahapatra, Jiangjiang Liu, Krishnan Sundaresan","doi":"10.1145/773146.773048","DOIUrl":"https://doi.org/10.1145/773146.773048","url":null,"abstract":"The memory system stores information comprising primarily instructions and data and secondarily address information, such as cache tag fields. It interacts with the processor by supporting related traffic (again comprising addresses, instructions, and data). Continuing exponential growth in processor performance, combined with technology, architecture, and application trends, place enormous demands on the memory system to permit this information storage and exchange at a high-enough performance (i.e., to provide low latency and high bandwidth access to large amounts of information). This paper comprehensively analyzes the redundancy in the information (addresses, instructions, and data) stored and exchanged between the processor and the memory system and evaluates the potential of compression in improving performance of the memory system. Analysis of traces obtained with Sun Microsystems' Shade simulator simulating SPARC executables of nine integer and six floating-point programs in the SPEC CPU2000 benchmark suite yield impressive results. Well-designed compression schemes may provide benefits in performance that far outweigh the extra time and logic for compression and decompression. This will be more so in the future since the speed and size of logic (which will be used to perform compression/decompression) are improving and are projected to improve at a much higher rate compared to those of interconnect (which will be used to communicate the information), both on-chip and off-chip.","PeriodicalId":365109,"journal":{"name":"Memory System Performance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127227880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7