TCMalloc的反馈定向优化

Sangho Lee, Teresa L. Johnson, Easwaran Raman
{"title":"TCMalloc的反馈定向优化","authors":"Sangho Lee, Teresa L. Johnson, Easwaran Raman","doi":"10.1145/2618128.2618131","DOIUrl":null,"url":null,"abstract":"TCMalloc [9] is an open-source memory allocator. Its use of thread-local caches of free objects enables most allocations/deallocations to be satisfied from thread-local heaps not requiring locks, making it a highly scalable memory allocator for multi-threaded applications. TCMalloc code contains several parameters that control the thread-local caches. The values of these parameters have been carefully chosen to provide good performance for the common case. However, as we will show, the optimal values of these parameters depend upon application-specific memory allocation behavior, so there is no one configuration that attains the optimal performance in all applications. In light of this, this paper presents a feedback-directed optimization of TCMalloc. The proposed optimization method targets the batch sizes, which determine the aggressiveness and timing of thread cache management mechanisms that move free objects between central and thread-local caches. It aims to tailor the batch sizes to application behavior, in order to make prefetching from the central cache aggressive enough to reduce unnecessary synchronization, without causing other performance problems due to excessive garbage collection of free objects in the thread caches. To this end, the optimization method observes a target application during a profile run and uses an iterative algorithm to compute batch sizes. Empirical results show that the proposed optimization results in up to 10% performance improvement over the default configuration on Google internal benchmark applications.","PeriodicalId":181419,"journal":{"name":"Proceedings of the workshop on Memory Systems Performance and Correctness","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Feedback directed optimization of TCMalloc\",\"authors\":\"Sangho Lee, Teresa L. Johnson, Easwaran Raman\",\"doi\":\"10.1145/2618128.2618131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"TCMalloc [9] is an open-source memory allocator. Its use of thread-local caches of free objects enables most allocations/deallocations to be satisfied from thread-local heaps not requiring locks, making it a highly scalable memory allocator for multi-threaded applications. TCMalloc code contains several parameters that control the thread-local caches. The values of these parameters have been carefully chosen to provide good performance for the common case. However, as we will show, the optimal values of these parameters depend upon application-specific memory allocation behavior, so there is no one configuration that attains the optimal performance in all applications. In light of this, this paper presents a feedback-directed optimization of TCMalloc. The proposed optimization method targets the batch sizes, which determine the aggressiveness and timing of thread cache management mechanisms that move free objects between central and thread-local caches. It aims to tailor the batch sizes to application behavior, in order to make prefetching from the central cache aggressive enough to reduce unnecessary synchronization, without causing other performance problems due to excessive garbage collection of free objects in the thread caches. To this end, the optimization method observes a target application during a profile run and uses an iterative algorithm to compute batch sizes. Empirical results show that the proposed optimization results in up to 10% performance improvement over the default configuration on Google internal benchmark applications.\",\"PeriodicalId\":181419,\"journal\":{\"name\":\"Proceedings of the workshop on Memory Systems Performance and Correctness\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the workshop on Memory Systems Performance and Correctness\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2618128.2618131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the workshop on Memory Systems Performance and Correctness","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2618128.2618131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

摘要

TCMalloc[9]是一个开源内存分配器。它使用空闲对象的线程本地缓存,可以通过不需要锁的线程本地堆来满足大多数分配/释放,使其成为多线程应用程序的高度可扩展的内存分配器。TCMalloc代码包含几个控制线程本地缓存的参数。这些参数的值都经过精心选择,以便在常见情况下提供良好的性能。然而,正如我们将展示的,这些参数的最优值取决于特定于应用程序的内存分配行为,因此没有一种配置可以在所有应用程序中实现最优性能。鉴于此,本文提出了一种基于反馈的TCMalloc优化方法。所提出的优化方法以批大小为目标,这决定了线程缓存管理机制在中央和线程本地缓存之间移动空闲对象的侵略性和定时。它的目标是根据应用程序的行为调整批处理大小,以便从中央缓存中进行预取,以减少不必要的同步,而不会由于线程缓存中空闲对象的过度垃圾收集而导致其他性能问题。为此,优化方法在概要文件运行期间观察目标应用程序,并使用迭代算法计算批处理大小。实证结果表明,在Google内部基准测试应用程序上,与默认配置相比,提出的优化可使性能提高10%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Feedback directed optimization of TCMalloc
TCMalloc [9] is an open-source memory allocator. Its use of thread-local caches of free objects enables most allocations/deallocations to be satisfied from thread-local heaps not requiring locks, making it a highly scalable memory allocator for multi-threaded applications. TCMalloc code contains several parameters that control the thread-local caches. The values of these parameters have been carefully chosen to provide good performance for the common case. However, as we will show, the optimal values of these parameters depend upon application-specific memory allocation behavior, so there is no one configuration that attains the optimal performance in all applications. In light of this, this paper presents a feedback-directed optimization of TCMalloc. The proposed optimization method targets the batch sizes, which determine the aggressiveness and timing of thread cache management mechanisms that move free objects between central and thread-local caches. It aims to tailor the batch sizes to application behavior, in order to make prefetching from the central cache aggressive enough to reduce unnecessary synchronization, without causing other performance problems due to excessive garbage collection of free objects in the thread caches. To this end, the optimization method observes a target application during a profile run and uses an iterative algorithm to compute batch sizes. Empirical results show that the proposed optimization results in up to 10% performance improvement over the default configuration on Google internal benchmark applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信