{"title":"Precise computer comparisons via statistical resampling methods","authors":"Bin Li, Shaoming Chen, Lu Peng","doi":"10.1109/ISPASS.2015.7095787","DOIUrl":null,"url":null,"abstract":"Performance variability, stemming from non-deterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering variability. This may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between two computers, bootstrapping confidence estimation, and an empirical distribution and five-number-summary for performance evaluation. The results show that 1) the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large; 2) bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure (e.g. ratio of geometric means); and 3) when the difference is very small, a single test is often not enough to reveal the nature of the computer performance and a five-number-summary to summarize computer performance. We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples. Results show that our methods are precise and robust even when two computers have very similar performance metrics.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Performance variability, stemming from non-deterministic hardware and software behaviors or deterministic behaviors such as measurement bias, is a well-known phenomenon of computer systems which increases the difficulty of comparing computer performance metrics. Conventional methods use various measures (such as geometric mean) to quantify the performance of different benchmarks to compare computers without considering variability. This may lead to wrong conclusions. In this paper, we propose three resampling methods for performance evaluation and comparison: a randomization test for a general performance comparison between two computers, bootstrapping confidence estimation, and an empirical distribution and five-number-summary for performance evaluation. The results show that 1) the randomization test substantially improves our chance to identify the difference between performance comparisons when the difference is not large; 2) bootstrapping confidence estimation provides an accurate confidence interval for the performance comparison measure (e.g. ratio of geometric means); and 3) when the difference is very small, a single test is often not enough to reveal the nature of the computer performance and a five-number-summary to summarize computer performance. We illustrate the results and conclusion through detailed Monte Carlo simulation studies and real examples. Results show that our methods are precise and robust even when two computers have very similar performance metrics.