Thermometer: profile-guided btb replacement for data center applications

Shixin Song, Tanvir Ahmed Khan, Sara Mahdizadeh-Shahri, Akshitha Sriraman, N. Soundararajan, S. Subramoney, Daniel A. Jiménez, Heiner Litz, Baris Kasikci
{"title":"Thermometer: profile-guided btb replacement for data center applications","authors":"Shixin Song, Tanvir Ahmed Khan, Sara Mahdizadeh-Shahri, Akshitha Sriraman, N. Soundararajan, S. Subramoney, Daniel A. Jiménez, Heiner Litz, Baris Kasikci","doi":"10.1145/3470496.3527430","DOIUrl":null,"url":null,"abstract":"Modern processors employ a decoupled frontend with Fetch Directed Instruction Prefetching (FDIP) to avoid frontend stalls in data center applications. However, the large branch footprint of data center applications precipitates frequent Branch Target Buffer (BTB) misses that prohibit FDIP from eliminating more than 40% of all frontend stalls. We find that the state-of-the-art BTB optimization techniques (e.g., BTB prefetching and replacement mechanisms) cannot eliminate these misses due to their inadequate understanding of branch reuse behavior in data center applications. In this paper, we first perform a comprehensive characterization of the branch behavior of data center applications, and determine that identifying optimal BTB replacement decisions requires considering both transient and holistic (i.e., across the entire execution) branch behavior. We then present Thermometer, a novel BTB replacement technique that realizes the holistic branch behavior via a profile-guided analysis. Based on the collected profile, Thermometer generates useful BTB replacement hints that the underlying hardware can leverage. We evaluate Thermometer using 13 widely-used data center applications and demonstrate that it provides an average speedup of 8.7% (0.4%-64.9%) while outperforming the state-of-the-art BTB replacement techniques by 5.6× (on average, the best performing prior work achieves 1.5% speedup). We also demonstrate that Thermometer achieves a performance speedup that is, on average, 83.6% of the speedup achieved by the optimal BTB replacement policy.","PeriodicalId":337932,"journal":{"name":"Proceedings of the 49th Annual International Symposium on Computer Architecture","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3470496.3527430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Modern processors employ a decoupled frontend with Fetch Directed Instruction Prefetching (FDIP) to avoid frontend stalls in data center applications. However, the large branch footprint of data center applications precipitates frequent Branch Target Buffer (BTB) misses that prohibit FDIP from eliminating more than 40% of all frontend stalls. We find that the state-of-the-art BTB optimization techniques (e.g., BTB prefetching and replacement mechanisms) cannot eliminate these misses due to their inadequate understanding of branch reuse behavior in data center applications. In this paper, we first perform a comprehensive characterization of the branch behavior of data center applications, and determine that identifying optimal BTB replacement decisions requires considering both transient and holistic (i.e., across the entire execution) branch behavior. We then present Thermometer, a novel BTB replacement technique that realizes the holistic branch behavior via a profile-guided analysis. Based on the collected profile, Thermometer generates useful BTB replacement hints that the underlying hardware can leverage. We evaluate Thermometer using 13 widely-used data center applications and demonstrate that it provides an average speedup of 8.7% (0.4%-64.9%) while outperforming the state-of-the-art BTB replacement techniques by 5.6× (on average, the best performing prior work achieves 1.5% speedup). We also demonstrate that Thermometer achieves a performance speedup that is, on average, 83.6% of the speedup achieved by the optimal BTB replacement policy.
温度计:数据中心应用的配置文件引导btb替代品
现代处理器采用带取定向指令预取(FDIP)的解耦前端来避免数据中心应用程序中的前端停滞。然而,数据中心应用程序的庞大分支空间导致频繁的分支目标缓冲区(BTB)丢失,这使得FDIP无法消除超过40%的前端停机。我们发现最先进的BTB优化技术(例如,BTB预取和替换机制)无法消除这些缺失,因为它们对数据中心应用中的分支重用行为理解不足。在本文中,我们首先对数据中心应用程序的分支行为进行了全面的表征,并确定确定最佳的BTB替换决策需要考虑瞬时和整体(即整个执行过程)分支行为。然后,我们提出了温度计,一种新的BTB替代技术,通过配置文件引导分析实现整体分支行为。根据收集的配置文件,温度计生成有用的BTB替换提示,底层硬件可以利用这些提示。我们使用13个广泛使用的数据中心应用程序对温度计进行了评估,并证明它提供了8.7%(0.4%-64.9%)的平均加速,同时比最先进的BTB替换技术高出5.6倍(平均而言,表现最好的先前工作实现了1.5%的加速)。我们还证明了温度计实现的性能加速平均为最佳BTB替换策略所实现的83.6%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信