标记清除垃圾收集的软件预取:硬件分析和软件重新设计

ASPLOS XI Pub Date : 2004-10-07 DOI:10.1145/1024393.1024417

Chen-Yong Cher, Antony Lloyd Hosking, T. N. Vijaykumar

{"title":"标记清除垃圾收集的软件预取:硬件分析和软件重新设计","authors":"Chen-Yong Cher, Antony Lloyd Hosking, T. N. Vijaykumar","doi":"10.1145/1024393.1024417","DOIUrl":null,"url":null,"abstract":"Tracing garbage collectors traverse references from live program variables, transitively tracing out the closure of live objects. Memory accesses incurred during tracing are essentially random: a given object may contain references to any other object. Since application heaps are typically much larger than hardware caches, tracing results in many cache misses. Technology trends will make cache misses more important, so tracing is a prime target for prefetching.Simulation of Java benchmarks running with the Boehm-De-mers-Weiser mark-sweep garbage collector for a projected hardware platform reveal high tracing overhead (up to 65% of elapsed time), and that cache misses are a problem. Applying Boehm's default prefetching strategy yields improvements in execution time (16% on average with incremental/generational collection for GC-intensive benchmarks), but analysis shows that his strategy suffers from significant timing problems: prefetches that occur too early or too late relative to their matching loads. This analysis drives development of a new prefetching strategy that yields up to three times the performance improvement of Boehm's strategy for GC-intensive benchmark (27% average speedup), and achieves performance close to that of perfect timing ie, few misses for tracing accesses) on some benchmarks. Validating these simulation results with live runs on current hardware produces average speedup of 6% for the new strategy on GC-intensive benchmarks with a GC configuration that tightly controls heap growth. In contrast, Boehm's default prefetching strategy is ineffective on this platform.","PeriodicalId":344295,"journal":{"name":"ASPLOS XI","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign\",\"authors\":\"Chen-Yong Cher, Antony Lloyd Hosking, T. N. Vijaykumar\",\"doi\":\"10.1145/1024393.1024417\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tracing garbage collectors traverse references from live program variables, transitively tracing out the closure of live objects. Memory accesses incurred during tracing are essentially random: a given object may contain references to any other object. Since application heaps are typically much larger than hardware caches, tracing results in many cache misses. Technology trends will make cache misses more important, so tracing is a prime target for prefetching.Simulation of Java benchmarks running with the Boehm-De-mers-Weiser mark-sweep garbage collector for a projected hardware platform reveal high tracing overhead (up to 65% of elapsed time), and that cache misses are a problem. Applying Boehm's default prefetching strategy yields improvements in execution time (16% on average with incremental/generational collection for GC-intensive benchmarks), but analysis shows that his strategy suffers from significant timing problems: prefetches that occur too early or too late relative to their matching loads. This analysis drives development of a new prefetching strategy that yields up to three times the performance improvement of Boehm's strategy for GC-intensive benchmark (27% average speedup), and achieves performance close to that of perfect timing ie, few misses for tracing accesses) on some benchmarks. Validating these simulation results with live runs on current hardware produces average speedup of 6% for the new strategy on GC-intensive benchmarks with a GC configuration that tightly controls heap growth. In contrast, Boehm's default prefetching strategy is ineffective on this platform.\",\"PeriodicalId\":344295,\"journal\":{\"name\":\"ASPLOS XI\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ASPLOS XI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1024393.1024417\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS XI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1024393.1024417","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

摘要

跟踪垃圾收集器遍历来自活动程序变量的引用，传递地跟踪活动对象的闭包。跟踪期间产生的内存访问基本上是随机的:给定对象可能包含对任何其他对象的引用。由于应用程序堆通常比硬件缓存大得多，跟踪会导致许多缓存丢失。技术趋势将使缓存丢失变得更加重要，因此跟踪是预取的主要目标。对使用Boehm-De-mers-Weiser标记-清除垃圾收集器运行的Java基准测试的模拟显示，跟踪开销很高(高达运行时间的65%)，并且缓存丢失是一个问题。应用Boehm的默认预取策略可以提高执行时间(对于gc密集型基准测试，使用增量/分代收集平均可以提高16%)，但分析表明，他的策略存在严重的时间问题:相对于匹配负载，预取发生得太早或太晚。这种分析推动了一种新的预取策略的开发，这种策略在gc密集型基准测试中产生的性能提高是Boehm策略的三倍(平均加速提高27%)，并且在某些基准测试中实现了接近完美计时的性能(即跟踪访问的失误很少)。通过在当前硬件上的实时运行验证这些模拟结果，在GC密集型基准测试中使用严格控制堆增长的GC配置，新策略的平均加速速度为6%。相比之下，Boehm的默认预取策略在这个平台上是无效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Software prefetching for mark-sweep garbage collection: hardware analysis and software redesign

Tracing garbage collectors traverse references from live program variables, transitively tracing out the closure of live objects. Memory accesses incurred during tracing are essentially random: a given object may contain references to any other object. Since application heaps are typically much larger than hardware caches, tracing results in many cache misses. Technology trends will make cache misses more important, so tracing is a prime target for prefetching.Simulation of Java benchmarks running with the Boehm-De-mers-Weiser mark-sweep garbage collector for a projected hardware platform reveal high tracing overhead (up to 65% of elapsed time), and that cache misses are a problem. Applying Boehm's default prefetching strategy yields improvements in execution time (16% on average with incremental/generational collection for GC-intensive benchmarks), but analysis shows that his strategy suffers from significant timing problems: prefetches that occur too early or too late relative to their matching loads. This analysis drives development of a new prefetching strategy that yields up to three times the performance improvement of Boehm's strategy for GC-intensive benchmark (27% average speedup), and achieves performance close to that of perfect timing ie, few misses for tracing accesses) on some benchmarks. Validating these simulation results with live runs on current hardware produces average speedup of 6% for the new strategy on GC-intensive benchmarks with a GC configuration that tightly controls heap growth. In contrast, Boehm's default prefetching strategy is ineffective on this platform.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ASPLOS XI

自引率

0.00%

发文量