面向绿色数据中心冷却控制的物理导向安全深度强化学习

2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS) Pub Date : 2022-05-01 DOI:10.1109/iccps54341.2022.00021

Ruihang Wang, Xinyi Zhang, Xiaoxia Zhou, Yonggang Wen, Rui Tan

{"title":"面向绿色数据中心冷却控制的物理导向安全深度强化学习","authors":"Ruihang Wang, Xinyi Zhang, Xiaoxia Zhou, Yonggang Wen, Rui Tan","doi":"10.1109/iccps54341.2022.00021","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL opti-mizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraint during DRL's state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for single-hall data center cooling control. It applies offline imitation learning and online post-hoc rectification to holis-tically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states ex-plored by DRL. Extensive evaluation for chilled water and direct expansion cooled data centers in two climate conditions shows that our approach saves 22.7% to 26.6% total data center power compared with conventional control, reduces safety violations by 94.5% to 99% compared with reward shaping.","PeriodicalId":340078,"journal":{"name":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Toward Physics-Guided Safe Deep Reinforcement Learning for Green Data Center Cooling Control\",\"authors\":\"Ruihang Wang, Xinyi Zhang, Xiaoxia Zhou, Yonggang Wen, Rui Tan\",\"doi\":\"10.1109/iccps54341.2022.00021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL opti-mizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraint during DRL's state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for single-hall data center cooling control. It applies offline imitation learning and online post-hoc rectification to holis-tically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states ex-plored by DRL. Extensive evaluation for chilled water and direct expansion cooled data centers in two climate conditions shows that our approach saves 22.7% to 26.6% total data center power compared with conventional control, reduces safety violations by 94.5% to 99% compared with reward shaping.\",\"PeriodicalId\":340078,\"journal\":{\"name\":\"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iccps54341.2022.00021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccps54341.2022.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

深度强化学习(DRL)在解决马尔可夫决策过程(MDP)问题方面表现出良好的性能。由于DRL优化了长期回报，因此它是提高数据中心冷却能源效率的一种有前途的方法。然而，在DRL的状态勘探过程中，热安全约束的实施是主要的挑战。当探索性行为导致不安全时，普遍采用的奖励塑造方法增加了负奖励。因此，在学习如何预防不安全之前，它需要经历足够多的不安全状态。在本文中，我们提出了一个安全意识的DRL框架，用于单大厅数据中心的冷却控制。采用离线模仿学习和在线事后整改，从整体上防止在线DRL过程中的热不安全。特别地，事后整改搜索对drl推荐的操作的最小修改，使纠正后的操作不会导致不安全。整流设计基于热态转变模型，该模型使用历史安全运行轨迹拟合，并能够推断DRL探索的不安全状态的转变。对两种气候条件下的冷冻水和直接膨胀冷却数据中心的广泛评估表明，与传统控制相比，我们的方法节省了22.7%至26.6%的数据中心总电力，与奖励形成相比，减少了94.5%至99%的安全违规行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward Physics-Guided Safe Deep Reinforcement Learning for Green Data Center Cooling Control

Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL opti-mizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraint during DRL's state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for single-hall data center cooling control. It applies offline imitation learning and online post-hoc rectification to holis-tically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states ex-plored by DRL. Extensive evaluation for chilled water and direct expansion cooled data centers in two climate conditions shows that our approach saves 22.7% to 26.6% total data center power compared with conventional control, reduces safety violations by 94.5% to 99% compared with reward shaping.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)

自引率

0.00%

发文量