基于性能计数器的节能多核系统加速与并行化模型

2017 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2017-07-17 DOI:10.1109/HPCS.2017.68

M. A. N. Al-hayanni, R. Shafik, A. Rafiev, F. Xia, A. Yakovlev

{"title":"基于性能计数器的节能多核系统加速与并行化模型","authors":"M. A. N. Al-hayanni, R. Shafik, A. Rafiev, F. Xia, A. Yakovlev","doi":"10.1109/HPCS.2017.68","DOIUrl":null,"url":null,"abstract":"Traditional speedup models, such as Amdahls, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown. In this paper, we propose a novel method, whereby standard performance counters found in modern many-core platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters\",\"authors\":\"M. A. N. Al-hayanni, R. Shafik, A. Rafiev, F. Xia, A. Yakovlev\",\"doi\":\"10.1109/HPCS.2017.68\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional speedup models, such as Amdahls, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown. In this paper, we propose a novel method, whereby standard performance counters found in modern many-core platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.\",\"PeriodicalId\":115758,\"journal\":{\"name\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS.2017.68\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS.2017.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

传统的加速模型，如Amdahls，有助于研究在多核系统上运行并行工作负载的影响。然而，这些模型通常基于软件特性，假设理想的硬件行为。因此，这些模型对能源和/或性能驱动的系统优化的适用性受到两个因素的限制。首先，如果不检测原始软件代码，就无法测量加速;其次，在特定硬件上运行的应用程序的并行化系数通常是未知的。在本文中，我们提出了一种新的方法，即在现代多核平台中发现的标准性能计数器可以用来在不测量时间的情况下获得加速。我们假设加速可以精确地估计为并行多核系统的每周期指令与单核系统的每周期指令的比率。通过首次研究应用指令和系统指令，我们的方法可以确定并行化因子和能量和/或性能的最佳系统配置。该方法在三个不同的平台上进行了广泛的实验，核心数从4到61不等，在Linux操作系统上运行并行基准测试应用程序(包括合成和PARSEC基准测试)。使用我们的方法进行的加速和并行化估计及其广泛的交叉验证在这些系统中显示出可以忽略不计的误差(高达8%)。此外，我们证明了我们的方法的有效性，探索并行感知节能系统配置的多核心系统使用基于能量延迟积的公式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speedup and Parallelization Models for Energy-Efficient Many-Core Systems Using Performance Counters

Traditional speedup models, such as Amdahls, facilitate the study of the impact of running parallel workloads on manycore systems. However, these models are typically based on software characteristics, assuming ideal hardware behaviors. As such, the applicability of these models for energy and/or performance-driven system optimization is limited by two factors. Firstly, speedup cannot be measured without instrumenting the original software codes, and secondly, the parallelization factor of an application running on specific hardware is generally unknown. In this paper, we propose a novel method, whereby standard performance counters found in modern many-core platforms can be used to derive speedup without instrumenting applications for time measurements. We postulate that speedup can be accurately estimated as a ratio of instructions per cycle for a parallel manycore system to the instructions per cycle of a single core system. By studying the application instructions and system instructions for the first time, our method leads to the determination of the parallelization factor and the optimal system configuration for energy and/or performance. The method is extensively demonstrated through experiments on three different platforms with core numbers ranging from 4 to 61, running parallel benchmark applications (including synthetic and PARSEC benchmarks) on Linux operating system. Speedup and parallelization estimations using our method and their extensive cross-validations show negligible errors (up to 8%) in these systems. Additionally, we demonstrate the effectiveness of our method to explore parallelization-aware energy-efficient system configurations for many-core systems using energy-delay-product based formulations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量