Xin You, Hailong Yang, Zhibo Xuan, Zhongzhi Luan, D. Qian
{"title":"PowerSpector: Towards Energy Efficiency with Calling-Context-Aware Profiling","authors":"Xin You, Hailong Yang, Zhibo Xuan, Zhongzhi Luan, D. Qian","doi":"10.1109/ipdps53621.2022.00126","DOIUrl":null,"url":null,"abstract":"Energy efficiency has become one of the major concerns in high-performance computing systems towards exascale. On mainstream systems, dynamic voltage and frequency scaling (DVFS) and uncore frequency scaling (UFS) are two popular techniques to trade-off performance and power consumption to achieve better energy efficiency. However, the existing system software is oblivious to application characteristics and thus misses the opportunity for fine-grained power management. Meanwhile, manually instrumenting applications with power management codes are prohibitive due to heavy engineering efforts and thus hardly portable across platforms. In this paper, we propose Powerspector, a fine-grained code profiling and optimization tool with calling context awareness to automatically explore the opportunity for optimizing energy efficiency. The design of Powerspector consists of three phases, including significant region detection, performance profiling and power modeling, and frequency optimization. The first phase automatically identifies the profitable regions for frequency optimization. Then, the second phase guides the core/uncore frequency optimization with power models. The third phase injects frequency optimization codes targeting each significant code region across different calling contexts automatically. The experiment results demonstrate that Powerspector can achieve 1.13×(1.00×), 1.28×(1.09×), and 1.17×(1.06×) improvement on energy efficiency compared to static(region-based) tuning on Haswell, Broadwell, and Skylake platforms, respectively.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Energy efficiency has become one of the major concerns in high-performance computing systems towards exascale. On mainstream systems, dynamic voltage and frequency scaling (DVFS) and uncore frequency scaling (UFS) are two popular techniques to trade-off performance and power consumption to achieve better energy efficiency. However, the existing system software is oblivious to application characteristics and thus misses the opportunity for fine-grained power management. Meanwhile, manually instrumenting applications with power management codes are prohibitive due to heavy engineering efforts and thus hardly portable across platforms. In this paper, we propose Powerspector, a fine-grained code profiling and optimization tool with calling context awareness to automatically explore the opportunity for optimizing energy efficiency. The design of Powerspector consists of three phases, including significant region detection, performance profiling and power modeling, and frequency optimization. The first phase automatically identifies the profitable regions for frequency optimization. Then, the second phase guides the core/uncore frequency optimization with power models. The third phase injects frequency optimization codes targeting each significant code region across different calling contexts automatically. The experiment results demonstrate that Powerspector can achieve 1.13×(1.00×), 1.28×(1.09×), and 1.17×(1.06×) improvement on energy efficiency compared to static(region-based) tuning on Haswell, Broadwell, and Skylake platforms, respectively.