DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement Learning

IF 3 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Sustainable Computing Pub Date : 2024-02-06 DOI:10.1109/TSUSC.2024.3362697

Yiming Wang;Meng Hao;Hui He;Weizhe Zhang;Qiuyuan Tang;Xiaoyang Sun;Zheng Wang

{"title":"DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement Learning","authors":"Yiming Wang;Meng Hao;Hui He;Weizhe Zhang;Qiuyuan Tang;Xiaoyang Sun;Zheng Wang","doi":"10.1109/TSUSC.2024.3362697","DOIUrl":null,"url":null,"abstract":"Power and energy consumption is the limiting factor of modern computing systems. As the GPU becomes a mainstream computing device, power management for GPUs becomes increasingly important. Current works focus on GPU kernel-level power management, with challenges in portability due to architecture-specific considerations. We present \n<sc>DRLCap\n, a general runtime power management framework intended to support power management across various GPU architectures. It periodically monitors system-level information to dynamically detect program phase changes and model the workload and GPU system behavior. This elimination from kernel-specific constraints enhances adaptability and responsiveness. The framework leverages dynamic GPU frequency capping, which is the most widely used power knob, to control the power consumption. \n<sc>DRLCap\n employs deep reinforcement learning (DRL) to adapt to the changing of program phases by automatically adjusting its power policy through online learning, aiming to reduce the GPU power consumption without significantly compromising the application performance. We evaluate \n<sc>DRLCap\n on three NVIDIA and one AMD GPU architectures. Experimental results show that \n<sc>DRLCap\n improves prior GPU power optimization strategies by a large margin. On average, it reduces the GPU energy consumption by 22% with less than 3% performance slowdown on NVIDIA GPUs. This translates to a 20% improvement in the energy efficiency measured by the energy-delay product (EDP) over the NVIDIA default GPU power management strategy. For the AMD GPU architecture, \n<sc>DRLCap\n saves energy consumption by 10%, on average, with a 4% percentage loss, and improves energy efficiency by 8%.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"9 5","pages":"712-726"},"PeriodicalIF":3.0000,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10423248/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Power and energy consumption is the limiting factor of modern computing systems. As the GPU becomes a mainstream computing device, power management for GPUs becomes increasingly important. Current works focus on GPU kernel-level power management, with challenges in portability due to architecture-specific considerations. We present DRLCap , a general runtime power management framework intended to support power management across various GPU architectures. It periodically monitors system-level information to dynamically detect program phase changes and model the workload and GPU system behavior. This elimination from kernel-specific constraints enhances adaptability and responsiveness. The framework leverages dynamic GPU frequency capping, which is the most widely used power knob, to control the power consumption. DRLCap employs deep reinforcement learning (DRL) to adapt to the changing of program phases by automatically adjusting its power policy through online learning, aiming to reduce the GPU power consumption without significantly compromising the application performance. We evaluate DRLCap on three NVIDIA and one AMD GPU architectures. Experimental results show that DRLCap improves prior GPU power optimization strategies by a large margin. On average, it reduces the GPU energy consumption by 22% with less than 3% performance slowdown on NVIDIA GPUs. This translates to a 20% improvement in the energy efficiency measured by the energy-delay product (EDP) over the NVIDIA default GPU power management strategy. For the AMD GPU architecture, DRLCap saves energy consumption by 10%, on average, with a 4% percentage loss, and improves energy efficiency by 8%.

查看原文本刊更多论文

DRLCAP：运行时 GPU 频率上限与深度强化学习

功耗和能耗是现代计算系统的限制因素。随着 GPU 成为主流计算设备，GPU 的电源管理变得越来越重要。目前的工作主要集中在 GPU 内核级电源管理上，由于特定架构的考虑，在可移植性方面存在挑战。我们提出的 DRLCap 是一个通用运行时电源管理框架，旨在支持各种 GPU 架构的电源管理。它定期监控系统级信息，动态检测程序阶段的变化，并对工作负载和 GPU 系统行为进行建模。这种消除特定于内核的限制的方法增强了适应性和响应能力。该框架利用动态 GPU 频率上限（这是最广泛使用的功耗旋钮）来控制功耗。DRLCap 采用深度强化学习（DRL）技术，通过在线学习自动调整功耗策略，以适应程序阶段的变化，从而在不明显影响应用程序性能的情况下降低 GPU 功耗。我们在三种英伟达（NVIDIA）和一种 AMD GPU 架构上对 DRLCap 进行了评估。实验结果表明，DRLCap 大大改进了之前的 GPU 功耗优化策略。在英伟达™（NVIDIA®）图形处理器上，DRLCap 平均降低了 22% 的 GPU 能耗，而性能降低不到 3%。与英伟达™（NVIDIA®）默认的 GPU 电源管理策略相比，这意味着以能量-延迟积（EDP）衡量的能效提高了 20%。对于 AMD GPU 架构，DRLCap 平均可节省 10% 的能耗，损失百分比为 4%，能效提高了 8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Sustainable Computing Mathematics-Control and Optimization

CiteScore

7.70

自引率

2.60%

发文量