TCMalloc的反馈定向优化

Proceedings of the workshop on Memory Systems Performance and Correctness Pub Date : 2014-06-13 DOI:10.1145/2618128.2618131

Sangho Lee, Teresa L. Johnson, Easwaran Raman

{"title":"TCMalloc的反馈定向优化","authors":"Sangho Lee, Teresa L. Johnson, Easwaran Raman","doi":"10.1145/2618128.2618131","DOIUrl":null,"url":null,"abstract":"TCMalloc [9] is an open-source memory allocator. Its use of thread-local caches of free objects enables most allocations/deallocations to be satisfied from thread-local heaps not requiring locks, making it a highly scalable memory allocator for multi-threaded applications. TCMalloc code contains several parameters that control the thread-local caches. The values of these parameters have been carefully chosen to provide good performance for the common case. However, as we will show, the optimal values of these parameters depend upon application-specific memory allocation behavior, so there is no one configuration that attains the optimal performance in all applications. In light of this, this paper presents a feedback-directed optimization of TCMalloc. The proposed optimization method targets the batch sizes, which determine the aggressiveness and timing of thread cache management mechanisms that move free objects between central and thread-local caches. It aims to tailor the batch sizes to application behavior, in order to make prefetching from the central cache aggressive enough to reduce unnecessary synchronization, without causing other performance problems due to excessive garbage collection of free objects in the thread caches. To this end, the optimization method observes a target application during a profile run and uses an iterative algorithm to compute batch sizes. Empirical results show that the proposed optimization results in up to 10% performance improvement over the default configuration on Google internal benchmark applications.","PeriodicalId":181419,"journal":{"name":"Proceedings of the workshop on Memory Systems Performance and Correctness","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Feedback directed optimization of TCMalloc\",\"authors\":\"Sangho Lee, Teresa L. Johnson, Easwaran Raman\",\"doi\":\"10.1145/2618128.2618131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"TCMalloc [9] is an open-source memory allocator. Its use of thread-local caches of free objects enables most allocations/deallocations to be satisfied from thread-local heaps not requiring locks, making it a highly scalable memory allocator for multi-threaded applications. TCMalloc code contains several parameters that control the thread-local caches. The values of these parameters have been carefully chosen to provide good performance for the common case. However, as we will show, the optimal values of these parameters depend upon application-specific memory allocation behavior, so there is no one configuration that attains the optimal performance in all applications. In light of this, this paper presents a feedback-directed optimization of TCMalloc. The proposed optimization method targets the batch sizes, which determine the aggressiveness and timing of thread cache management mechanisms that move free objects between central and thread-local caches. It aims to tailor the batch sizes to application behavior, in order to make prefetching from the central cache aggressive enough to reduce unnecessary synchronization, without causing other performance problems due to excessive garbage collection of free objects in the thread caches. To this end, the optimization method observes a target application during a profile run and uses an iterative algorithm to compute batch sizes. Empirical results show that the proposed optimization results in up to 10% performance improvement over the default configuration on Google internal benchmark applications.\",\"PeriodicalId\":181419,\"journal\":{\"name\":\"Proceedings of the workshop on Memory Systems Performance and Correctness\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the workshop on Memory Systems Performance and Correctness\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2618128.2618131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the workshop on Memory Systems Performance and Correctness","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2618128.2618131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

TCMalloc[9]是一个开源内存分配器。它使用空闲对象的线程本地缓存，可以通过不需要锁的线程本地堆来满足大多数分配/释放，使其成为多线程应用程序的高度可扩展的内存分配器。TCMalloc代码包含几个控制线程本地缓存的参数。这些参数的值都经过精心选择，以便在常见情况下提供良好的性能。然而，正如我们将展示的，这些参数的最优值取决于特定于应用程序的内存分配行为，因此没有一种配置可以在所有应用程序中实现最优性能。鉴于此，本文提出了一种基于反馈的TCMalloc优化方法。所提出的优化方法以批大小为目标，这决定了线程缓存管理机制在中央和线程本地缓存之间移动空闲对象的侵略性和定时。它的目标是根据应用程序的行为调整批处理大小，以便从中央缓存中进行预取，以减少不必要的同步，而不会由于线程缓存中空闲对象的过度垃圾收集而导致其他性能问题。为此，优化方法在概要文件运行期间观察目标应用程序，并使用迭代算法计算批处理大小。实证结果表明，在Google内部基准测试应用程序上，与默认配置相比，提出的优化可使性能提高10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Feedback directed optimization of TCMalloc

TCMalloc [9] is an open-source memory allocator. Its use of thread-local caches of free objects enables most allocations/deallocations to be satisfied from thread-local heaps not requiring locks, making it a highly scalable memory allocator for multi-threaded applications. TCMalloc code contains several parameters that control the thread-local caches. The values of these parameters have been carefully chosen to provide good performance for the common case. However, as we will show, the optimal values of these parameters depend upon application-specific memory allocation behavior, so there is no one configuration that attains the optimal performance in all applications. In light of this, this paper presents a feedback-directed optimization of TCMalloc. The proposed optimization method targets the batch sizes, which determine the aggressiveness and timing of thread cache management mechanisms that move free objects between central and thread-local caches. It aims to tailor the batch sizes to application behavior, in order to make prefetching from the central cache aggressive enough to reduce unnecessary synchronization, without causing other performance problems due to excessive garbage collection of free objects in the thread caches. To this end, the optimization method observes a target application during a profile run and uses an iterative algorithm to compute batch sizes. Empirical results show that the proposed optimization results in up to 10% performance improvement over the default configuration on Google internal benchmark applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the workshop on Memory Systems Performance and Correctness

自引率

0.00%

发文量