{"title":"硬件性能计数器可信吗?","authors":"Vincent M. Weaver, S. Mckee","doi":"10.1109/IISWC.2008.4636099","DOIUrl":null,"url":null,"abstract":"When creating architectural tools, it is essential to know whether the generated results make sense. Comparing a toolpsilas outputs against hardware performance counters on an actual machine is a common means of executing a quick sanity check. If the results do not match, this can indicate problems with the tool, unknown interactions with the benchmarks being investigated, or even unexpected behavior of the real hardware. To make future analyses of this type easier, we explore the behavior of the SPEC benchmarks with both dynamic binary instrumentation (DBI) tools and hardware counters. We collect retired instruction performance counter data from the full SPEC CPU 2000 and 2006 benchmark suites on nine different implementations of the times86 architecture. When run with no special preparation, hardware counters have a coefficient of variation of up to 1.07%. After analyzing results in depth, we find that minor changes to the experimental setup reduce observed errors to less than 0.002% for all benchmarks. The fact that subtle changes in how experiments are conducted can largely impact observed results is unexpected, and it is important that researchers using these counters be aware of the issues involved.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"105","resultStr":"{\"title\":\"Can hardware performance counters be trusted?\",\"authors\":\"Vincent M. Weaver, S. Mckee\",\"doi\":\"10.1109/IISWC.2008.4636099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When creating architectural tools, it is essential to know whether the generated results make sense. Comparing a toolpsilas outputs against hardware performance counters on an actual machine is a common means of executing a quick sanity check. If the results do not match, this can indicate problems with the tool, unknown interactions with the benchmarks being investigated, or even unexpected behavior of the real hardware. To make future analyses of this type easier, we explore the behavior of the SPEC benchmarks with both dynamic binary instrumentation (DBI) tools and hardware counters. We collect retired instruction performance counter data from the full SPEC CPU 2000 and 2006 benchmark suites on nine different implementations of the times86 architecture. When run with no special preparation, hardware counters have a coefficient of variation of up to 1.07%. After analyzing results in depth, we find that minor changes to the experimental setup reduce observed errors to less than 0.002% for all benchmarks. The fact that subtle changes in how experiments are conducted can largely impact observed results is unexpected, and it is important that researchers using these counters be aware of the issues involved.\",\"PeriodicalId\":447179,\"journal\":{\"name\":\"2008 IEEE International Symposium on Workload Characterization\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"105\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Symposium on Workload Characterization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2008.4636099\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2008.4636099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 105
摘要
在创建体系结构工具时,了解生成的结果是否有意义是很重要的。将工具界面输出与实际机器上的硬件性能计数器进行比较是执行快速完整性检查的常用方法。如果结果不匹配,这可能表明工具存在问题,与正在调查的基准测试的未知交互,甚至是实际硬件的意外行为。为了使以后对这种类型的分析更容易,我们使用动态二进制检测(DBI)工具和硬件计数器来研究SPEC基准的行为。我们从完整的SPEC CPU 2000和2006基准测试套件中收集了九种不同的times86体系结构实现的退役指令性能计数器数据。在没有特殊准备的情况下运行时,硬件计数器的变异系数高达1.07%。在深入分析结果后,我们发现对实验设置的微小更改将所有基准测试的观察误差降低到小于0.002%。实验方式的细微变化会在很大程度上影响观察到的结果,这是意想不到的,使用这些计数器的研究人员意识到所涉及的问题是很重要的。
When creating architectural tools, it is essential to know whether the generated results make sense. Comparing a toolpsilas outputs against hardware performance counters on an actual machine is a common means of executing a quick sanity check. If the results do not match, this can indicate problems with the tool, unknown interactions with the benchmarks being investigated, or even unexpected behavior of the real hardware. To make future analyses of this type easier, we explore the behavior of the SPEC benchmarks with both dynamic binary instrumentation (DBI) tools and hardware counters. We collect retired instruction performance counter data from the full SPEC CPU 2000 and 2006 benchmark suites on nine different implementations of the times86 architecture. When run with no special preparation, hardware counters have a coefficient of variation of up to 1.07%. After analyzing results in depth, we find that minor changes to the experimental setup reduce observed errors to less than 0.002% for all benchmarks. The fact that subtle changes in how experiments are conducted can largely impact observed results is unexpected, and it is important that researchers using these counters be aware of the issues involved.