J. Yi, Resit Sendag, L. Eeckhout, A. Joshi, D. Lilja, L. John
{"title":"Evaluating Benchmark Subsetting Approaches","authors":"J. Yi, Resit Sendag, L. Eeckhout, A. Joshi, D. Lilja, L. John","doi":"10.1109/IISWC.2006.302733","DOIUrl":null,"url":null,"abstract":"To reduce the simulation time to a tractable amount or due to compilation (or other related) problems, computer architects often simulate only a subset of the benchmarks in a benchmark suite. However, if the architect chooses a subset of benchmarks that is not representative, the subsequent simulation results will, at best, be misleading or, at worst, yield incorrect conclusions. To address this problem, computer architects have recently proposed several statistically-based approaches to subset a benchmark suite. While some of these approaches are well-grounded statistically, what has not yet been thoroughly evaluated is the: 1) absolute accuracy; 2) relative accuracy across a range of processor and memory subsystem enhancements; and 3) representativeness and coverage of each approach for a range of subset sizes. Specifically, this paper evaluates statistically-based subsetting approaches based on principal components analysis (PCA) and the Plackett and Burman (P&B) design, in addition to prevailing approaches such as integer vs. floating-point, core vs. memory-bound, by language, and at random. Our results show that the two statistically-based approaches, PCA and P&B, have the best absolute and relative accuracy for CPI and energy-delay product (EDP), produce subsets that are the most representative, and choose benchmark and input set pairs that are most well-distributed across the benchmark space. To achieve a 5% absolute CPI and EDP error, across a wide range of configurations, PCA and P&B typically need about 17 benchmark and input set pairs, while the other five approaches often choose more than 30 benchmark and input set pairs","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"399 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2006.302733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36
Abstract
To reduce the simulation time to a tractable amount or due to compilation (or other related) problems, computer architects often simulate only a subset of the benchmarks in a benchmark suite. However, if the architect chooses a subset of benchmarks that is not representative, the subsequent simulation results will, at best, be misleading or, at worst, yield incorrect conclusions. To address this problem, computer architects have recently proposed several statistically-based approaches to subset a benchmark suite. While some of these approaches are well-grounded statistically, what has not yet been thoroughly evaluated is the: 1) absolute accuracy; 2) relative accuracy across a range of processor and memory subsystem enhancements; and 3) representativeness and coverage of each approach for a range of subset sizes. Specifically, this paper evaluates statistically-based subsetting approaches based on principal components analysis (PCA) and the Plackett and Burman (P&B) design, in addition to prevailing approaches such as integer vs. floating-point, core vs. memory-bound, by language, and at random. Our results show that the two statistically-based approaches, PCA and P&B, have the best absolute and relative accuracy for CPI and energy-delay product (EDP), produce subsets that are the most representative, and choose benchmark and input set pairs that are most well-distributed across the benchmark space. To achieve a 5% absolute CPI and EDP error, across a wide range of configurations, PCA and P&B typically need about 17 benchmark and input set pairs, while the other five approaches often choose more than 30 benchmark and input set pairs
为了将模拟时间减少到可处理的数量,或者由于编译(或其他相关)问题,计算机架构师通常只模拟基准套件中的一个子集。然而,如果架构师选择了一个不具有代表性的基准子集,那么随后的模拟结果在最好的情况下会产生误导,或者在最坏的情况下产生不正确的结论。为了解决这个问题,计算机架构师最近提出了几种基于统计的方法来对基准套件进行子集化。虽然其中一些方法在统计上是有充分根据的,但尚未得到彻底评估的是:1)绝对准确性;2)一系列处理器和内存子系统的相对准确性增强;3)每种方法在一定子集大小范围内的代表性和覆盖率。具体而言,本文评估了基于主成分分析(PCA)和Plackett and Burman (P&B)设计的基于统计的子集方法,以及流行的方法,如整数与浮点数,核心与内存绑定,按语言和随机。我们的研究结果表明,两种基于统计的方法PCA和P&B对CPI和能量延迟积(EDP)具有最佳的绝对和相对精度,产生最具代表性的子集,并选择在基准空间中分布最均匀的基准和输入集对。为了在广泛的配置中实现5%的CPI和EDP绝对误差,PCA和P&B通常需要大约17个基准和输入集对,而其他五种方法通常需要30多个基准和输入集对