{"title":"统计抽样对当代工作负载的有效性:以SPEC CPU2017为例","authors":"Sarabjeet Singh, M. Awasthi","doi":"10.1109/IISWC47752.2019.9042114","DOIUrl":null,"url":null,"abstract":"New benchmark suites are constantly being released, with each one providing a much larger set of benchmarks, representing an ever-growing variety of workloads. Contemporary workloads are increasingly more complex in their computational and memory footprints. Most computer architecture research is based on the ability of researchers to simulate novel ideas with a variety of workloads representing the domain being researched. However, bigger and complex benchmarks suites have made it extremely impractical to simulate complete benchmarks from start to finish. As a result, architects are becoming increasingly dependent on statistical sampling techniques like SimPoints, which identify long, repetitive execution phases in benchmarks, and limit simulations to a few instances of these phases. These techniques present an inherent trade-off between simulation speed and accuracy. This work presents results and insights for determining the accuracy of simulation points for the SPEC CPU2017 suite, using Pin and PinPoints, which is an implementation of SimPoints for the x86 ISA. Our analysis concludes that carefully chosen simulation points faithfully represent the workload; we observe <1% variance in the instruction distribution between full runs and the ones using SimPoints, while reducing simulation time by ~750x. We also show that on average, just 12 phases can faithfully represent the 90th percentile of a benchmark's behavior, which can help reduce the overall simulation time by up to ~1297x. In addition, using performance statistics with native binaries on real hardware and from an architectural model of the same machine using SimPoints, we report good co-relations between the two on metrics such as CPI. Finally, we present cases like memory hierarchy explorations, where SimPoints should be used judiciously and with extreme caution in order to derive correct conclusions - inappropriately chosen SimPoint configurations can show large deviations in memory hierarchy behavior as compared to full runs, as reported by prior studies.","PeriodicalId":121068,"journal":{"name":"2019 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Efficacy of Statistical Sampling on Contemporary Workloads: The Case of SPEC CPU2017\",\"authors\":\"Sarabjeet Singh, M. Awasthi\",\"doi\":\"10.1109/IISWC47752.2019.9042114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"New benchmark suites are constantly being released, with each one providing a much larger set of benchmarks, representing an ever-growing variety of workloads. Contemporary workloads are increasingly more complex in their computational and memory footprints. Most computer architecture research is based on the ability of researchers to simulate novel ideas with a variety of workloads representing the domain being researched. However, bigger and complex benchmarks suites have made it extremely impractical to simulate complete benchmarks from start to finish. As a result, architects are becoming increasingly dependent on statistical sampling techniques like SimPoints, which identify long, repetitive execution phases in benchmarks, and limit simulations to a few instances of these phases. These techniques present an inherent trade-off between simulation speed and accuracy. This work presents results and insights for determining the accuracy of simulation points for the SPEC CPU2017 suite, using Pin and PinPoints, which is an implementation of SimPoints for the x86 ISA. Our analysis concludes that carefully chosen simulation points faithfully represent the workload; we observe <1% variance in the instruction distribution between full runs and the ones using SimPoints, while reducing simulation time by ~750x. We also show that on average, just 12 phases can faithfully represent the 90th percentile of a benchmark's behavior, which can help reduce the overall simulation time by up to ~1297x. In addition, using performance statistics with native binaries on real hardware and from an architectural model of the same machine using SimPoints, we report good co-relations between the two on metrics such as CPI. Finally, we present cases like memory hierarchy explorations, where SimPoints should be used judiciously and with extreme caution in order to derive correct conclusions - inappropriately chosen SimPoint configurations can show large deviations in memory hierarchy behavior as compared to full runs, as reported by prior studies.\",\"PeriodicalId\":121068,\"journal\":{\"name\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Workload Characterization (IISWC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC47752.2019.9042114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC47752.2019.9042114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficacy of Statistical Sampling on Contemporary Workloads: The Case of SPEC CPU2017
New benchmark suites are constantly being released, with each one providing a much larger set of benchmarks, representing an ever-growing variety of workloads. Contemporary workloads are increasingly more complex in their computational and memory footprints. Most computer architecture research is based on the ability of researchers to simulate novel ideas with a variety of workloads representing the domain being researched. However, bigger and complex benchmarks suites have made it extremely impractical to simulate complete benchmarks from start to finish. As a result, architects are becoming increasingly dependent on statistical sampling techniques like SimPoints, which identify long, repetitive execution phases in benchmarks, and limit simulations to a few instances of these phases. These techniques present an inherent trade-off between simulation speed and accuracy. This work presents results and insights for determining the accuracy of simulation points for the SPEC CPU2017 suite, using Pin and PinPoints, which is an implementation of SimPoints for the x86 ISA. Our analysis concludes that carefully chosen simulation points faithfully represent the workload; we observe <1% variance in the instruction distribution between full runs and the ones using SimPoints, while reducing simulation time by ~750x. We also show that on average, just 12 phases can faithfully represent the 90th percentile of a benchmark's behavior, which can help reduce the overall simulation time by up to ~1297x. In addition, using performance statistics with native binaries on real hardware and from an architectural model of the same machine using SimPoints, we report good co-relations between the two on metrics such as CPI. Finally, we present cases like memory hierarchy explorations, where SimPoints should be used judiciously and with extreme caution in order to derive correct conclusions - inappropriately chosen SimPoint configurations can show large deviations in memory hierarchy behavior as compared to full runs, as reported by prior studies.