Yiming Wang;Meng Hao;Hui He;Weizhe Zhang;Qiuyuan Tang;Xiaoyang Sun;Zheng Wang
{"title":"DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement Learning","authors":"Yiming Wang;Meng Hao;Hui He;Weizhe Zhang;Qiuyuan Tang;Xiaoyang Sun;Zheng Wang","doi":"10.1109/TSUSC.2024.3362697","DOIUrl":null,"url":null,"abstract":"Power and energy consumption is the limiting factor of modern computing systems. As the GPU becomes a mainstream computing device, power management for GPUs becomes increasingly important. Current works focus on GPU kernel-level power management, with challenges in portability due to architecture-specific considerations. We present \n<sc>DRLCap</small>\n, a general runtime power management framework intended to support power management across various GPU architectures. It periodically monitors system-level information to dynamically detect program phase changes and model the workload and GPU system behavior. This elimination from kernel-specific constraints enhances adaptability and responsiveness. The framework leverages dynamic GPU frequency capping, which is the most widely used power knob, to control the power consumption. \n<sc>DRLCap</small>\n employs deep reinforcement learning (DRL) to adapt to the changing of program phases by automatically adjusting its power policy through online learning, aiming to reduce the GPU power consumption without significantly compromising the application performance. We evaluate \n<sc>DRLCap</small>\n on three NVIDIA and one AMD GPU architectures. Experimental results show that \n<sc>DRLCap</small>\n improves prior GPU power optimization strategies by a large margin. On average, it reduces the GPU energy consumption by 22% with less than 3% performance slowdown on NVIDIA GPUs. This translates to a 20% improvement in the energy efficiency measured by the energy-delay product (EDP) over the NVIDIA default GPU power management strategy. For the AMD GPU architecture, \n<sc>DRLCap</small>\n saves energy consumption by 10%, on average, with a 4% percentage loss, and improves energy efficiency by 8%.","PeriodicalId":13268,"journal":{"name":"IEEE Transactions on Sustainable Computing","volume":"9 5","pages":"712-726"},"PeriodicalIF":3.0000,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Sustainable Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10423248/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Power and energy consumption is the limiting factor of modern computing systems. As the GPU becomes a mainstream computing device, power management for GPUs becomes increasingly important. Current works focus on GPU kernel-level power management, with challenges in portability due to architecture-specific considerations. We present
DRLCap
, a general runtime power management framework intended to support power management across various GPU architectures. It periodically monitors system-level information to dynamically detect program phase changes and model the workload and GPU system behavior. This elimination from kernel-specific constraints enhances adaptability and responsiveness. The framework leverages dynamic GPU frequency capping, which is the most widely used power knob, to control the power consumption.
DRLCap
employs deep reinforcement learning (DRL) to adapt to the changing of program phases by automatically adjusting its power policy through online learning, aiming to reduce the GPU power consumption without significantly compromising the application performance. We evaluate
DRLCap
on three NVIDIA and one AMD GPU architectures. Experimental results show that
DRLCap
improves prior GPU power optimization strategies by a large margin. On average, it reduces the GPU energy consumption by 22% with less than 3% performance slowdown on NVIDIA GPUs. This translates to a 20% improvement in the energy efficiency measured by the energy-delay product (EDP) over the NVIDIA default GPU power management strategy. For the AMD GPU architecture,
DRLCap
saves energy consumption by 10%, on average, with a 4% percentage loss, and improves energy efficiency by 8%.