Phyllis: Physics-Informed Lifelong Reinforcement Learning for Data Center Cooling Control

Proceedings of the 14th ACM International Conference on Future Energy Systems Pub Date : 2023-06-16 DOI:10.1145/3575813.3595189

Ruihang Wang, Zhi-Ying Cao, Xiaoxia Zhou, Yonggang Wen, Rui Tan

{"title":"Phyllis: Physics-Informed Lifelong Reinforcement Learning for Data Center Cooling Control","authors":"Ruihang Wang, Zhi-Ying Cao, Xiaoxia Zhou, Yonggang Wen, Rui Tan","doi":"10.1145/3575813.3595189","DOIUrl":null,"url":null,"abstract":"Deep reinforcement learning (DRL) has shown good performance in data center cooling control for improving energy efficiency. The main challenge in deploying the DRL agent to real-world data centers is how to quickly adapt the agent to the ever-changing system with thermal safety compliance. Existing approaches rely on DRL’s native fine-tuning or a learned data-driven dynamics model to assist the adaptation. However, they require long-term unsafe exploration before the agent or the model can capture a new environment. This paper proposes Phyllis, a physics-informed reinforcement learning approach to assist the DRL agent’s lifelong learning under evolving data center environment. Phyllis first identifies a transition model to capture the data hall thermodynamics in the offline stage. When the environment changes in the online stage, Phyllis assists the adaptation by i) supervising safe data collection with the identified transition model, ii) fitting power usage and residual thermal models, iii) pretraining the agent by interacting with these models, and iv) deploying the agent for further fine-tuning. Phyllis uses known physical laws to inform the transition and power models for improving the extrapolation ability to unseen states. Extensive evaluation for two simulated data centers with different system changes shows that Phyllis saves 5.7% to 13.8% energy usage compared with feedback cooling control and adapts to new environments 8x to 10x faster than fine-tuning with at most 0.74°C temperature overshoot.","PeriodicalId":359352,"journal":{"name":"Proceedings of the 14th ACM International Conference on Future Energy Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM International Conference on Future Energy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575813.3595189","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Deep reinforcement learning (DRL) has shown good performance in data center cooling control for improving energy efficiency. The main challenge in deploying the DRL agent to real-world data centers is how to quickly adapt the agent to the ever-changing system with thermal safety compliance. Existing approaches rely on DRL’s native fine-tuning or a learned data-driven dynamics model to assist the adaptation. However, they require long-term unsafe exploration before the agent or the model can capture a new environment. This paper proposes Phyllis, a physics-informed reinforcement learning approach to assist the DRL agent’s lifelong learning under evolving data center environment. Phyllis first identifies a transition model to capture the data hall thermodynamics in the offline stage. When the environment changes in the online stage, Phyllis assists the adaptation by i) supervising safe data collection with the identified transition model, ii) fitting power usage and residual thermal models, iii) pretraining the agent by interacting with these models, and iv) deploying the agent for further fine-tuning. Phyllis uses known physical laws to inform the transition and power models for improving the extrapolation ability to unseen states. Extensive evaluation for two simulated data centers with different system changes shows that Phyllis saves 5.7% to 13.8% energy usage compared with feedback cooling control and adapts to new environments 8x to 10x faster than fine-tuning with at most 0.74°C temperature overshoot.

查看原文本刊更多论文

数据中心冷却控制的物理信息终身强化学习

深度强化学习(Deep reinforcement learning, DRL)在数据中心冷却控制中显示出良好的性能，以提高能源效率。将DRL代理部署到实际数据中心的主要挑战是如何快速调整代理以适应不断变化的系统并符合热安全要求。现有的方法依赖于DRL的原生微调或学习数据驱动的动态模型来辅助适应。然而，在代理或模型能够捕获新环境之前，它们需要长期的不安全探索。本文提出了一种基于物理的强化学习方法Phyllis，以帮助DRL智能体在不断发展的数据中心环境下进行终身学习。Phyllis首先确定了一个转换模型来捕捉离线阶段的数据大厅热力学。当环境在在线阶段发生变化时，Phyllis通过以下方式协助适应:i)使用确定的过渡模型监督安全数据收集，ii)拟合功率使用和剩余热模型，iii)通过与这些模型交互预训练智能体，以及iv)部署智能体进行进一步微调。菲利斯使用已知的物理定律来告知过渡和功率模型，以提高对未知状态的外推能力。对两个具有不同系统变化的模拟数据中心的广泛评估表明，与反馈冷却控制相比，Phyllis节省了5.7%至13.8%的能源使用，适应新环境的速度比微调快8到10倍，温度超调最多为0.74°C。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 14th ACM International Conference on Future Energy Systems

自引率

0.00%

发文量