Ruihang Wang, Zhi-Ying Cao, Xiaoxia Zhou, Yonggang Wen, Rui Tan
{"title":"基于物理指导的安全强化学习的绿色数据中心冷却控制","authors":"Ruihang Wang, Zhi-Ying Cao, Xiaoxia Zhou, Yonggang Wen, Rui Tan","doi":"10.1145/3582577","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraints during DRL’s state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for data center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion-cooled data centers in two climate conditions show that our approach saves 18% to 26.6% of total data center power compared with conventional control and reduces safety violations by 94.5% to 99% compared with reward shaping. We also extend the proposed framework to address data centers with non-uniform temperature distributions for detailed safety considerations. The evaluation shows that our approach saves 14% power usage compared with the PID control while addressing safety compliance during the training.","PeriodicalId":7055,"journal":{"name":"ACM Transactions on Cyber-Physical Systems","volume":" ","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Green Data Center Cooling Control via Physics-Guided Safe Reinforcement Learning\",\"authors\":\"Ruihang Wang, Zhi-Ying Cao, Xiaoxia Zhou, Yonggang Wen, Rui Tan\",\"doi\":\"10.1145/3582577\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraints during DRL’s state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for data center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion-cooled data centers in two climate conditions show that our approach saves 18% to 26.6% of total data center power compared with conventional control and reduces safety violations by 94.5% to 99% compared with reward shaping. We also extend the proposed framework to address data centers with non-uniform temperature distributions for detailed safety considerations. The evaluation shows that our approach saves 14% power usage compared with the PID control while addressing safety compliance during the training.\",\"PeriodicalId\":7055,\"journal\":{\"name\":\"ACM Transactions on Cyber-Physical Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Cyber-Physical Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3582577\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Cyber-Physical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3582577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Green Data Center Cooling Control via Physics-Guided Safe Reinforcement Learning
Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL optimizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraints during DRL’s state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for data center cooling control. It applies offline imitation learning and online post-hoc rectification to holistically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states explored by DRL. Extensive evaluation for chilled water and direct expansion-cooled data centers in two climate conditions show that our approach saves 18% to 26.6% of total data center power compared with conventional control and reduces safety violations by 94.5% to 99% compared with reward shaping. We also extend the proposed framework to address data centers with non-uniform temperature distributions for detailed safety considerations. The evaluation shows that our approach saves 14% power usage compared with the PID control while addressing safety compliance during the training.