统计抽样对当代工作负载的有效性:以SPEC CPU2017为例

2019 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2019-11-01 DOI:10.1109/IISWC47752.2019.9042114

Sarabjeet Singh, M. Awasthi

{"title":"统计抽样对当代工作负载的有效性:以SPEC CPU2017为例","authors":"Sarabjeet Singh, M. Awasthi","doi":"10.1109/IISWC47752.2019.9042114","DOIUrl":null,"url":null,"abstract":"New benchmark suites are constantly being released, with each one providing a much larger set of benchmarks, representing an ever-growing variety of workloads. Contemporary workloads are increasingly more complex in their computational and memory footprints. Most computer architecture research is based on the ability of researchers to simulate novel ideas with a variety of workloads representing the domain being researched. However, bigger and complex benchmarks suites have made it extremely impractical to simulate complete benchmarks from start to finish. As a result, architects are becoming increasingly dependent on statistical sampling techniques like SimPoints, which identify long, repetitive execution phases in benchmarks, and limit simulations to a few instances of these phases. These techniques present an inherent trade-off between simulation speed and accuracy. This work presents results and insights for determining the accuracy of simulation points for the SPEC CPU2017 suite, using Pin and PinPoints, which is an implementation of SimPoints for the x86 ISA. Our analysis concludes that carefully chosen simulation points faithfully represent the workload; we observe <1% variance in the instruction distribution between full runs and the ones using SimPoints, while reducing simulation time by ~750x. We also show that on average, just 12 phases can faithfully represent the 90th percentile of a benchmark's behavior, which can help reduce the overall simulation time by up to ~1297x. In addition, using performance statistics with native binaries on real hardware and from an architectural model of the same machine using SimPoints, we report good co-relations between the two on metrics such as CPI. Finally, we present cases like memory hierarchy explorations, where SimPoints should be used judiciously and with extreme caution in order to derive correct conclusions - inappropriately chosen SimPoint configurations can show large deviations in memory hierarchy behavior as compared to full runs, as reported by prior studies.","PeriodicalId":121068,"journal":{"name":"2019 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Efficacy of Statistical Sampling on Contemporary Workloads: The Case of SPEC CPU2017\",\"authors\":\"Sarabjeet Singh, M. Awasthi\",\"doi\":\"10.1109/IISWC47752.2019.9042114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"New benchmark suites are constantly being released, with each one providing a much larger set of benchmarks, representing an ever-growing variety of workloads. Contemporary workloads are increasingly more complex in their computational and memory footprints. Most computer architecture research is based on the ability of researchers to simulate novel ideas with a variety of workloads representing the domain being researched. However, bigger and complex benchmarks suites have made it extremely impractical to simulate complete benchmarks from start to finish. As a result, architects are becoming increasingly dependent on statistical sampling techniques like SimPoints, which identify long, repetitive execution phases in benchmarks, and limit simulations to a few instances of these phases. These techniques present an inherent trade-off between simulation speed and accuracy. This work presents results and insights for determining the accuracy of simulation points for the SPEC CPU2017 suite, using Pin and PinPoints, which is an implementation of SimPoints for the x86 ISA. Our analysis concludes that carefully chosen simulation points faithfully represent the workload; we observe <1% variance in the instruction distribution between full runs and the ones using SimPoints, while reducing simulation time by ~750x. We also show that on average, just 12 phases can faithfully represent the 90th percentile of a benchmark's behavior, which can help reduce the overall simulation time by up to ~1297x. In addition, using performance statistics with native binaries on real hardware and from an architectural model of the same machine using SimPoints, we report good co-relations between the two on metrics such as CPI. Finally, we present cases like memory hierarchy explorations, where SimPoints should be used judiciously and with extreme caution in order to derive correct conclusions - inappropriately chosen SimPoint configurations can show large deviations in memory hierarchy behavior as compared to full runs, as reported by prior studies.\",\"PeriodicalId\":121068,\"journal\":{\"name\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC47752.2019.9042114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC47752.2019.9042114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

新的基准套件不断发布，每个套件都提供了更大的基准集，代表了不断增长的各种工作负载。当代工作负载的计算和内存占用越来越复杂。大多数计算机体系结构的研究都是基于研究人员在不同工作负载下模拟新思想的能力，这些工作负载代表了所研究的领域。然而，更大、更复杂的基准套件使得从头到尾模拟完整的基准测试变得极其不切实际。因此，架构师越来越依赖于统计采样技术，比如SimPoints，它在基准测试中识别长时间的、重复的执行阶段，并将模拟限制在这些阶段的几个实例中。这些技术在模拟速度和准确性之间存在固有的权衡。这项工作提供了使用Pin和PinPoints确定SPEC CPU2017套件仿真点准确性的结果和见解，这是x86 ISA的SimPoints实现。我们的分析得出的结论是，精心选择的模拟点忠实地代表了工作量;我们观察到完整运行和使用SimPoints的指令分布之间的差异<1%，同时将模拟时间减少了约750倍。我们还表明，平均而言，仅12个阶段就可以忠实地表示基准行为的第90个百分位数，这可以帮助减少总体模拟时间高达1297x。此外，通过使用真实硬件上的本机二进制数据和使用SimPoints的同一机器的体系结构模型的性能统计数据，我们报告了两者在CPI等指标上良好的相互关系。最后，我们提出了一些类似内存层次结构探索的案例，在这些案例中，SimPoint应该谨慎使用，并非常谨慎地得出正确的结论——与完整运行相比，不适当选择的SimPoint配置可能会显示出内存层次结构行为的巨大偏差，正如之前的研究所报告的那样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficacy of Statistical Sampling on Contemporary Workloads: The Case of SPEC CPU2017

New benchmark suites are constantly being released, with each one providing a much larger set of benchmarks, representing an ever-growing variety of workloads. Contemporary workloads are increasingly more complex in their computational and memory footprints. Most computer architecture research is based on the ability of researchers to simulate novel ideas with a variety of workloads representing the domain being researched. However, bigger and complex benchmarks suites have made it extremely impractical to simulate complete benchmarks from start to finish. As a result, architects are becoming increasingly dependent on statistical sampling techniques like SimPoints, which identify long, repetitive execution phases in benchmarks, and limit simulations to a few instances of these phases. These techniques present an inherent trade-off between simulation speed and accuracy. This work presents results and insights for determining the accuracy of simulation points for the SPEC CPU2017 suite, using Pin and PinPoints, which is an implementation of SimPoints for the x86 ISA. Our analysis concludes that carefully chosen simulation points faithfully represent the workload; we observe <1% variance in the instruction distribution between full runs and the ones using SimPoints, while reducing simulation time by ~750x. We also show that on average, just 12 phases can faithfully represent the 90th percentile of a benchmark's behavior, which can help reduce the overall simulation time by up to ~1297x. In addition, using performance statistics with native binaries on real hardware and from an architectural model of the same machine using SimPoints, we report good co-relations between the two on metrics such as CPI. Finally, we present cases like memory hierarchy explorations, where SimPoints should be used judiciously and with extreme caution in order to derive correct conclusions - inappropriately chosen SimPoint configurations can show large deviations in memory hierarchy behavior as compared to full runs, as reported by prior studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量