Gnu-RL:一个使用可微MPC策略的建筑HVAC控制的早熟强化学习解决方案

Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation Pub Date : 2019-11-13 DOI:10.1145/3360322.3360849

Bingqing Chen, Zicheng Cai, M. Berges

{"title":"Gnu-RL:一个使用可微MPC策略的建筑HVAC控制的早熟强化学习解决方案","authors":"Bingqing Chen, Zicheng Cai, M. Berges","doi":"10.1145/3360322.3360849","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) was first demonstrated to be a feasible approach to controlling heating, ventilation, and air conditioning (HVAC) systems more than a decade ago. However, there has been limited progress towards a practical and scalable RL solution for HVAC control. While one can train an RL agent in simulation, it is not cost-effective to create a model for each thermal zone or building. Likewise, existing RL agents generally take a long time to learn and are opaque to expert interrogation, making them unattractive for real-world deployment. To tackle these challenges, we propose Gnu-RL: a novel approach that enables practical deployment of RL for HVAC control and requires no prior information other than historical data from existing HVAC controllers. To achieve this, Gnu-RL adopts a recently-developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynamics, making it both sample-efficient and interpretable. Prior to any interaction with the environment, a Gnu-RL agent is pre-trained on historical data using imitation learning, which enables it to match the behavior of the existing controller. Once it is put in charge of controlling the environment, the agent continues to improve its policy end-to-end, using a policy gradient algorithm. We evaluate Gnu-RL on both an EnergyPlus model and a real-world testbed. In both experiments, our agents were directly deployed in the environment after offline pre-training on expert demonstration. In the simulation experiment, our approach saved 6.6% energy compared to the best published RL result for the same environment, while maintaining a higher level of occupant comfort. Next, Gnu-RL was deployed to control the HVAC of a real-world conference room for a three-week period. Our results show that Gnu-RL saved 16.7% of cooling demand compared to the existing controller and tracked temperature set-point better.","PeriodicalId":128826,"journal":{"name":"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"95","resultStr":"{\"title\":\"Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy\",\"authors\":\"Bingqing Chen, Zicheng Cai, M. Berges\",\"doi\":\"10.1145/3360322.3360849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) was first demonstrated to be a feasible approach to controlling heating, ventilation, and air conditioning (HVAC) systems more than a decade ago. However, there has been limited progress towards a practical and scalable RL solution for HVAC control. While one can train an RL agent in simulation, it is not cost-effective to create a model for each thermal zone or building. Likewise, existing RL agents generally take a long time to learn and are opaque to expert interrogation, making them unattractive for real-world deployment. To tackle these challenges, we propose Gnu-RL: a novel approach that enables practical deployment of RL for HVAC control and requires no prior information other than historical data from existing HVAC controllers. To achieve this, Gnu-RL adopts a recently-developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynamics, making it both sample-efficient and interpretable. Prior to any interaction with the environment, a Gnu-RL agent is pre-trained on historical data using imitation learning, which enables it to match the behavior of the existing controller. Once it is put in charge of controlling the environment, the agent continues to improve its policy end-to-end, using a policy gradient algorithm. We evaluate Gnu-RL on both an EnergyPlus model and a real-world testbed. In both experiments, our agents were directly deployed in the environment after offline pre-training on expert demonstration. In the simulation experiment, our approach saved 6.6% energy compared to the best published RL result for the same environment, while maintaining a higher level of occupant comfort. Next, Gnu-RL was deployed to control the HVAC of a real-world conference room for a three-week period. Our results show that Gnu-RL saved 16.7% of cooling demand compared to the existing controller and tracked temperature set-point better.\",\"PeriodicalId\":128826,\"journal\":{\"name\":\"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"95\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3360322.3360849\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3360322.3360849","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 95

摘要

十多年前，强化学习(RL)首次被证明是一种控制供暖、通风和空调(HVAC)系统的可行方法。然而，在暖通空调控制的实用和可扩展的RL解决方案方面，进展有限。虽然可以在模拟中训练RL代理，但为每个热区或建筑物创建模型并不具有成本效益。同样，现有的强化学习代理通常需要很长时间来学习，并且对专家的审讯不透明，这使得它们对现实世界的部署没有吸引力。为了应对这些挑战，我们提出了Gnu-RL:一种新颖的方法，可以在HVAC控制中实际部署RL，并且除了现有HVAC控制器的历史数据外，不需要任何先验信息。为了实现这一目标，Gnu-RL采用了最近开发的可微分模型预测控制(MPC)策略，该策略对规划和系统动力学的领域知识进行编码，使其既具有样本效率又具有可解释性。在与环境进行任何交互之前，Gnu-RL代理使用模仿学习在历史数据上进行预训练，这使其能够匹配现有控制器的行为。一旦它负责控制环境，代理将使用策略梯度算法继续改进其端到端策略。我们在EnergyPlus模型和实际测试平台上对Gnu-RL进行了评估。在这两个实验中，我们的智能体都是在专家演示的离线预训练后直接部署到环境中。在模拟实验中，与已发表的最佳RL结果相比，我们的方法在相同环境下节省了6.6%的能源，同时保持了更高的乘员舒适度。接下来，Gnu-RL被部署来控制一个真实会议室的暖通空调，为期三周。我们的研究结果表明，与现有控制器相比，Gnu-RL节省了16.7%的冷却需求，并且更好地跟踪了温度设定点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gnu-RL: A Precocial Reinforcement Learning Solution for Building HVAC Control Using a Differentiable MPC Policy

Reinforcement learning (RL) was first demonstrated to be a feasible approach to controlling heating, ventilation, and air conditioning (HVAC) systems more than a decade ago. However, there has been limited progress towards a practical and scalable RL solution for HVAC control. While one can train an RL agent in simulation, it is not cost-effective to create a model for each thermal zone or building. Likewise, existing RL agents generally take a long time to learn and are opaque to expert interrogation, making them unattractive for real-world deployment. To tackle these challenges, we propose Gnu-RL: a novel approach that enables practical deployment of RL for HVAC control and requires no prior information other than historical data from existing HVAC controllers. To achieve this, Gnu-RL adopts a recently-developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynamics, making it both sample-efficient and interpretable. Prior to any interaction with the environment, a Gnu-RL agent is pre-trained on historical data using imitation learning, which enables it to match the behavior of the existing controller. Once it is put in charge of controlling the environment, the agent continues to improve its policy end-to-end, using a policy gradient algorithm. We evaluate Gnu-RL on both an EnergyPlus model and a real-world testbed. In both experiments, our agents were directly deployed in the environment after offline pre-training on expert demonstration. In the simulation experiment, our approach saved 6.6% energy compared to the best published RL result for the same environment, while maintaining a higher level of occupant comfort. Next, Gnu-RL was deployed to control the HVAC of a real-world conference room for a three-week period. Our results show that Gnu-RL saved 16.7% of cooling demand compared to the existing controller and tracked temperature set-point better.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation

自引率

0.00%

发文量