DRLCAP:运行时 GPU 频率上限与深度强化学习

IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Yiming Wang;Meng Hao;Hui He;Weizhe Zhang;Qiuyuan Tang;Xiaoyang Sun;Zheng Wang
{"title":"DRLCAP:运行时 GPU 频率上限与深度强化学习","authors":"Yiming Wang;Meng Hao;Hui He;Weizhe Zhang;Qiuyuan Tang;Xiaoyang Sun;Zheng Wang","doi":"10.1109/TSUSC.2024.3362697","DOIUrl":null,"url":null,"abstract":"Power and energy consumption is the limiting factor of modern computing systems. As the GPU becomes a mainstream computing device, power management for GPUs becomes increasingly important. Current works focus on GPU kernel-level power management, with challenges in portability due to architecture-specific considerations. We present \n<sc>DRLCap</small>\n, a general runtime power management framework intended to support power management across various GPU architectures. It periodically monitors system-level information to dynamically detect program phase changes and model the workload and GPU system behavior. This elimination from kernel-specific constraints enhances adaptability and responsiveness. The framework leverages dynamic GPU frequency capping, which is the most widely used power knob, to control the power consumption. \n<sc>DRLCap</small>\n employs deep reinforcement learning (DRL) to adapt to the changing of program phases by automatically adjusting its power policy through online learning, aiming to reduce the GPU power consumption without significantly compromising the application performance. We evaluate \n<sc>DRLCap</small>\n on three NVIDIA and one AMD GPU architectures. Experimental results show that \n<sc>DRLCap</small>\n improves prior GPU power optimization strategies by a large margin. On average, it reduces the GPU energy consumption by 22% with less than 3% performance slowdown on NVIDIA GPUs. This translates to a 20% improvement in the energy efficiency measured by the energy-delay product (EDP) over the NVIDIA default GPU power management strategy. For the AMD GPU architecture, \n<sc>DRLCap</small>\n saves energy consumption by 10%, on average, with a 4% percentage loss, and improves energy efficiency by 8%.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"9 5","pages":"712-726"},"PeriodicalIF":3.0000,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement Learning\",\"authors\":\"Yiming Wang;Meng Hao;Hui He;Weizhe Zhang;Qiuyuan Tang;Xiaoyang Sun;Zheng Wang\",\"doi\":\"10.1109/TSUSC.2024.3362697\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Power and energy consumption is the limiting factor of modern computing systems. As the GPU becomes a mainstream computing device, power management for GPUs becomes increasingly important. Current works focus on GPU kernel-level power management, with challenges in portability due to architecture-specific considerations. We present \\n<sc>DRLCap</small>\\n, a general runtime power management framework intended to support power management across various GPU architectures. It periodically monitors system-level information to dynamically detect program phase changes and model the workload and GPU system behavior. This elimination from kernel-specific constraints enhances adaptability and responsiveness. The framework leverages dynamic GPU frequency capping, which is the most widely used power knob, to control the power consumption. \\n<sc>DRLCap</small>\\n employs deep reinforcement learning (DRL) to adapt to the changing of program phases by automatically adjusting its power policy through online learning, aiming to reduce the GPU power consumption without significantly compromising the application performance. We evaluate \\n<sc>DRLCap</small>\\n on three NVIDIA and one AMD GPU architectures. Experimental results show that \\n<sc>DRLCap</small>\\n improves prior GPU power optimization strategies by a large margin. On average, it reduces the GPU energy consumption by 22% with less than 3% performance slowdown on NVIDIA GPUs. This translates to a 20% improvement in the energy efficiency measured by the energy-delay product (EDP) over the NVIDIA default GPU power management strategy. For the AMD GPU architecture, \\n<sc>DRLCap</small>\\n saves energy consumption by 10%, on average, with a 4% percentage loss, and improves energy efficiency by 8%.\",\"PeriodicalId\":13268,\"journal\":{\"name\":\"IEEE Transactions on Sustainable Computing\",\"volume\":\"9 5\",\"pages\":\"712-726\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-02-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Sustainable Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10423248/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10423248/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

功耗和能耗是现代计算系统的限制因素。随着 GPU 成为主流计算设备,GPU 的电源管理变得越来越重要。目前的工作主要集中在 GPU 内核级电源管理上,由于特定架构的考虑,在可移植性方面存在挑战。我们提出的 DRLCap 是一个通用运行时电源管理框架,旨在支持各种 GPU 架构的电源管理。它定期监控系统级信息,动态检测程序阶段的变化,并对工作负载和 GPU 系统行为进行建模。这种消除特定于内核的限制的方法增强了适应性和响应能力。该框架利用动态 GPU 频率上限(这是最广泛使用的功耗旋钮)来控制功耗。DRLCap 采用深度强化学习(DRL)技术,通过在线学习自动调整功耗策略,以适应程序阶段的变化,从而在不明显影响应用程序性能的情况下降低 GPU 功耗。我们在三种英伟达(NVIDIA)和一种 AMD GPU 架构上对 DRLCap 进行了评估。实验结果表明,DRLCap 大大改进了之前的 GPU 功耗优化策略。在英伟达™(NVIDIA®)图形处理器上,DRLCap 平均降低了 22% 的 GPU 能耗,而性能降低不到 3%。与英伟达™(NVIDIA®)默认的 GPU 电源管理策略相比,这意味着以能量-延迟积(EDP)衡量的能效提高了 20%。对于 AMD GPU 架构,DRLCap 平均可节省 10% 的能耗,损失百分比为 4%,能效提高了 8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement Learning
Power and energy consumption is the limiting factor of modern computing systems. As the GPU becomes a mainstream computing device, power management for GPUs becomes increasingly important. Current works focus on GPU kernel-level power management, with challenges in portability due to architecture-specific considerations. We present DRLCap , a general runtime power management framework intended to support power management across various GPU architectures. It periodically monitors system-level information to dynamically detect program phase changes and model the workload and GPU system behavior. This elimination from kernel-specific constraints enhances adaptability and responsiveness. The framework leverages dynamic GPU frequency capping, which is the most widely used power knob, to control the power consumption. DRLCap employs deep reinforcement learning (DRL) to adapt to the changing of program phases by automatically adjusting its power policy through online learning, aiming to reduce the GPU power consumption without significantly compromising the application performance. We evaluate DRLCap on three NVIDIA and one AMD GPU architectures. Experimental results show that DRLCap improves prior GPU power optimization strategies by a large margin. On average, it reduces the GPU energy consumption by 22% with less than 3% performance slowdown on NVIDIA GPUs. This translates to a 20% improvement in the energy efficiency measured by the energy-delay product (EDP) over the NVIDIA default GPU power management strategy. For the AMD GPU architecture, DRLCap saves energy consumption by 10%, on average, with a 4% percentage loss, and improves energy efficiency by 8%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Sustainable Computing
IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization
CiteScore
7.70
自引率
2.60%
发文量
54
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信