NSL-BLRL: efficient cache warmup for sampled processor simulation

Luk Van Ertvelde, Filip Hellebaut, L. Eeckhout, K. D. Bosschere
{"title":"NSL-BLRL: efficient cache warmup for sampled processor simulation","authors":"Luk Van Ertvelde, Filip Hellebaut, L. Eeckhout, K. D. Bosschere","doi":"10.1109/ANSS.2006.33","DOIUrl":null,"url":null,"abstract":"Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold-start problem are cache hierarchies. In this paper, we propose NSL-BLRL which combines two previously proposed cache hierarchy warmup approaches, namely no-state-loss (NSL) and boundary line reuse latency (BLRL). The idea of NSL-BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a least-recently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL-BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL-BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL, and (iii) is more space-efficient than NSL. As such, we conclude that NSL-BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.","PeriodicalId":308739,"journal":{"name":"39th Annual Simulation Symposium (ANSS'06)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"39th Annual Simulation Symposium (ANSS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ANSS.2006.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold-start problem are cache hierarchies. In this paper, we propose NSL-BLRL which combines two previously proposed cache hierarchy warmup approaches, namely no-state-loss (NSL) and boundary line reuse latency (BLRL). The idea of NSL-BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a least-recently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL-BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL-BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL, and (iii) is more space-efficient than NSL. As such, we conclude that NSL-BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.
NSL-BLRL:用于采样处理器仿真的高效缓存预热
考虑到需要为当代基准测试模拟大量指令,体系结构模拟非常耗时。采样模拟从完整的基准执行中选择大量的样本,从而产生显著的速度提升。然而,为了最小化非采样偏差,需要处理一个主要问题,即每个样本开始时的硬件状态。这在文献中被称为冷启动问题。受冷启动问题影响最大的硬件结构是缓存层次结构。在本文中,我们提出了NSL-BLRL,它结合了之前提出的两种缓存层次预热方法,即无状态损失(NSL)和边界线重用延迟(BLRL)。NSL- blrl的思想是使用存储截断的NSL流的硬件状态检查点来预热缓存层次结构。NSL流是预采样中最近最少使用的(唯一的)内存引用流。然后截断这个NSL流,形成NSL- blrl预热检查点;这是通过检查样本来完成的,以确定在预样本中需要返回多远才能准确地预热给定样本的硬件状态。我们使用SPEC CPU2000基准测试表明,NSL-BLRL (i)在采样处理器模拟中几乎与BLRL和NSL一样准确,(ii)与BLRL相比,仿真时间速度提高了几个数量级,(iii)比NSL更节省空间。因此,我们得出结论,NSL-BLRL是一种高效、准确的采样处理器模拟缓存预热策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信