Safe HVAC Control via Batch Reinforcement Learning

2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS) Pub Date : 2022-05-01 DOI:10.1109/iccps54341.2022.00023

Hsin-Yu Liu, Bharathan Balaji, Sicun Gao, Rajesh K. Gupta, Dezhi Hong

{"title":"Safe HVAC Control via Batch Reinforcement Learning","authors":"Hsin-Yu Liu, Bharathan Balaji, Sicun Gao, Rajesh K. Gupta, Dezhi Hong","doi":"10.1109/iccps54341.2022.00023","DOIUrl":null,"url":null,"abstract":"Buildings account for 30% of energy use worldwide, and approxi-mately half of it is ascribed to HVAC systems. Reinforcement Learning (RL) has improved upon traditional control methods in increasing the energy efficiency of HVAC systems. However, prior works use online RL methods that require configuring complex thermal simulators to train or use historical data-driven thermal models that can take at least 104 time steps to reach rule-based performance Also, due to the distribution drift from simulator to real buildings, RL solutions are therefore seldom deployed in the real world. On the other hand, batch RL methods can learn from the historical data and improve upon the existing policy without any interactions with the real buildings or simulators during the training. With the existing rule-based policy as the priors, the policies learned with batch RL are better than the existing control from the first day of deployment with very few training steps compared with online methods. Our algorithm incorporates a Kullback-Leibler (KL) regularization term to penalize policies that deviate far from the previous ones. We evaluate our framework on a real multi-zone, multi-floor building-it achieves 7.2% in energy reduction cf. the state-of-the-art batch RL method, and outperforms other BRL methods in occu-pants' thermal comfort, and 16.7% energy reduction compared to the default rule-based control.","PeriodicalId":340078,"journal":{"name":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccps54341.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Buildings account for 30% of energy use worldwide, and approxi-mately half of it is ascribed to HVAC systems. Reinforcement Learning (RL) has improved upon traditional control methods in increasing the energy efficiency of HVAC systems. However, prior works use online RL methods that require configuring complex thermal simulators to train or use historical data-driven thermal models that can take at least 104 time steps to reach rule-based performance Also, due to the distribution drift from simulator to real buildings, RL solutions are therefore seldom deployed in the real world. On the other hand, batch RL methods can learn from the historical data and improve upon the existing policy without any interactions with the real buildings or simulators during the training. With the existing rule-based policy as the priors, the policies learned with batch RL are better than the existing control from the first day of deployment with very few training steps compared with online methods. Our algorithm incorporates a Kullback-Leibler (KL) regularization term to penalize policies that deviate far from the previous ones. We evaluate our framework on a real multi-zone, multi-floor building-it achieves 7.2% in energy reduction cf. the state-of-the-art batch RL method, and outperforms other BRL methods in occu-pants' thermal comfort, and 16.7% energy reduction compared to the default rule-based control.

查看原文本刊更多论文

通过批强化学习的安全HVAC控制

建筑占全球能源使用量的30%，其中约一半归因于暖通空调系统。强化学习(RL)在提高暖通空调系统能源效率方面改进了传统的控制方法。然而，之前的工作使用在线强化学习方法，需要配置复杂的热模拟器来训练或使用历史数据驱动的热模型，这些模型至少需要104个时间步才能达到基于规则的性能。此外，由于从模拟器到真实建筑的分布漂移，因此强化学习解决方案很少在现实世界中部署。另一方面，批处理强化学习方法可以从历史数据中学习并改进现有策略，而无需在训练过程中与真实建筑物或模拟器进行任何交互。以现有的基于规则的策略为先验，从部署的第一天开始，批量强化学习的策略就优于现有的控制，与在线方法相比，训练步骤很少。我们的算法结合了一个Kullback-Leibler (KL)正则化项来惩罚那些与之前的策略偏离很远的策略。我们在一个真实的多区域、多层建筑上评估了我们的框架，与最先进的批量RL方法相比，它实现了7.2%的节能，在乘员的热舒适方面优于其他BRL方法，与默认的基于规则的控制相比，节能16.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)

自引率

0.00%

发文量