使用机器学习的GPGPU性能和功耗估计

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-02-07 DOI:10.1109/HPCA.2015.7056063

Gene Y. Wu, J. Greathouse, Alexander Lyashevsky, N. Jayasena, Derek Chiou

{"title":"使用机器学习的GPGPU性能和功耗估计","authors":"Gene Y. Wu, J. Greathouse, Alexander Lyashevsky, N. Jayasena, Derek Chiou","doi":"10.1109/HPCA.2015.7056063","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) have numerous configuration and design options, including core frequency, number of parallel compute units (CUs), and available memory bandwidth. At many stages of the design process, it is important to estimate how application performance and power are impacted by these options. This paper describes a GPU performance and power estimation model that uses machine learning techniques on measurements from real GPU hardware. The model is trained on a collection of applications that are run at numerous different hardware configurations. From the measured performance and power data, the model learns how applications scale as the GPU's configuration is changed. Hardware performance counter values are then gathered when running a new application on a single GPU configuration. These dynamic counter values are fed into a neural network that predicts which scaling curve from the training data best represents this kernel. This scaling curve is then used to estimate the performance and power of the new application at different GPU configurations. Over an 8× range of the number of CUs, a 3.3× range of core frequencies, and a 2.9× range of memory bandwidth, our model's performance and power estimates are accurate to within 15% and 10% of real hardware, respectively. This is comparable to the accuracy of cycle-level simulators. However, after an initial training phase, our model runs as fast as, or faster than the program running natively on real hardware.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"63 Suppl 1 1","pages":"564-576"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"191","resultStr":"{\"title\":\"GPGPU performance and power estimation using machine learning\",\"authors\":\"Gene Y. Wu, J. Greathouse, Alexander Lyashevsky, N. Jayasena, Derek Chiou\",\"doi\":\"10.1109/HPCA.2015.7056063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics Processing Units (GPUs) have numerous configuration and design options, including core frequency, number of parallel compute units (CUs), and available memory bandwidth. At many stages of the design process, it is important to estimate how application performance and power are impacted by these options. This paper describes a GPU performance and power estimation model that uses machine learning techniques on measurements from real GPU hardware. The model is trained on a collection of applications that are run at numerous different hardware configurations. From the measured performance and power data, the model learns how applications scale as the GPU's configuration is changed. Hardware performance counter values are then gathered when running a new application on a single GPU configuration. These dynamic counter values are fed into a neural network that predicts which scaling curve from the training data best represents this kernel. This scaling curve is then used to estimate the performance and power of the new application at different GPU configurations. Over an 8× range of the number of CUs, a 3.3× range of core frequencies, and a 2.9× range of memory bandwidth, our model's performance and power estimates are accurate to within 15% and 10% of real hardware, respectively. This is comparable to the accuracy of cycle-level simulators. However, after an initial training phase, our model runs as fast as, or faster than the program running natively on real hardware.\",\"PeriodicalId\":6593,\"journal\":{\"name\":\"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"63 Suppl 1 1\",\"pages\":\"564-576\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"191\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2015.7056063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 191

摘要

图形处理单元(gpu)具有许多配置和设计选项，包括核心频率、并行计算单元(cu)数量和可用内存带宽。在设计过程的许多阶段，重要的是要估计这些选项对应用程序性能和功耗的影响。本文描述了一个GPU性能和功耗估计模型，该模型使用机器学习技术对真实GPU硬件进行测量。该模型是在一组应用程序上训练的，这些应用程序在许多不同的硬件配置下运行。根据测量的性能和功耗数据，该模型了解随着GPU配置的改变，应用程序如何扩展。当在单个GPU配置上运行新应用程序时，将收集硬件性能计数器值。这些动态计数器值被输入到一个神经网络中，该神经网络从训练数据中预测哪条缩放曲线最能代表该内核。然后使用此缩放曲线来估计新应用程序在不同GPU配置下的性能和功率。在8倍的cu数量范围内，3.3倍的核心频率范围内，以及2.9倍的内存带宽范围内，我们的模型的性能和功耗估计分别精确到实际硬件的15%和10%以内。这与循环级模拟器的精度相当。然而，在初始训练阶段之后，我们的模型运行速度与实际硬件上本机运行的程序一样快，甚至更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPGPU performance and power estimation using machine learning

Graphics Processing Units (GPUs) have numerous configuration and design options, including core frequency, number of parallel compute units (CUs), and available memory bandwidth. At many stages of the design process, it is important to estimate how application performance and power are impacted by these options. This paper describes a GPU performance and power estimation model that uses machine learning techniques on measurements from real GPU hardware. The model is trained on a collection of applications that are run at numerous different hardware configurations. From the measured performance and power data, the model learns how applications scale as the GPU's configuration is changed. Hardware performance counter values are then gathered when running a new application on a single GPU configuration. These dynamic counter values are fed into a neural network that predicts which scaling curve from the training data best represents this kernel. This scaling curve is then used to estimate the performance and power of the new application at different GPU configurations. Over an 8× range of the number of CUs, a 3.3× range of core frequencies, and a 2.9× range of memory bandwidth, our model's performance and power estimates are accurate to within 15% and 10% of real hardware, respectively. This is comparable to the accuracy of cycle-level simulators. However, after an initial training phase, our model runs as fast as, or faster than the program running natively on real hardware.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量