Safe HVAC Control via Batch Reinforcement Learning

Hsin-Yu Liu, Bharathan Balaji, Sicun Gao, Rajesh K. Gupta, Dezhi Hong
{"title":"Safe HVAC Control via Batch Reinforcement Learning","authors":"Hsin-Yu Liu, Bharathan Balaji, Sicun Gao, Rajesh K. Gupta, Dezhi Hong","doi":"10.1109/iccps54341.2022.00023","DOIUrl":null,"url":null,"abstract":"Buildings account for 30% of energy use worldwide, and approxi-mately half of it is ascribed to HVAC systems. Reinforcement Learning (RL) has improved upon traditional control methods in increasing the energy efficiency of HVAC systems. However, prior works use online RL methods that require configuring complex thermal simulators to train or use historical data-driven thermal models that can take at least 104 time steps to reach rule-based performance Also, due to the distribution drift from simulator to real buildings, RL solutions are therefore seldom deployed in the real world. On the other hand, batch RL methods can learn from the historical data and improve upon the existing policy without any interactions with the real buildings or simulators during the training. With the existing rule-based policy as the priors, the policies learned with batch RL are better than the existing control from the first day of deployment with very few training steps compared with online methods. Our algorithm incorporates a Kullback-Leibler (KL) regularization term to penalize policies that deviate far from the previous ones. We evaluate our framework on a real multi-zone, multi-floor building-it achieves 7.2% in energy reduction cf. the state-of-the-art batch RL method, and outperforms other BRL methods in occu-pants' thermal comfort, and 16.7% energy reduction compared to the default rule-based control.","PeriodicalId":340078,"journal":{"name":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccps54341.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Buildings account for 30% of energy use worldwide, and approxi-mately half of it is ascribed to HVAC systems. Reinforcement Learning (RL) has improved upon traditional control methods in increasing the energy efficiency of HVAC systems. However, prior works use online RL methods that require configuring complex thermal simulators to train or use historical data-driven thermal models that can take at least 104 time steps to reach rule-based performance Also, due to the distribution drift from simulator to real buildings, RL solutions are therefore seldom deployed in the real world. On the other hand, batch RL methods can learn from the historical data and improve upon the existing policy without any interactions with the real buildings or simulators during the training. With the existing rule-based policy as the priors, the policies learned with batch RL are better than the existing control from the first day of deployment with very few training steps compared with online methods. Our algorithm incorporates a Kullback-Leibler (KL) regularization term to penalize policies that deviate far from the previous ones. We evaluate our framework on a real multi-zone, multi-floor building-it achieves 7.2% in energy reduction cf. the state-of-the-art batch RL method, and outperforms other BRL methods in occu-pants' thermal comfort, and 16.7% energy reduction compared to the default rule-based control.
通过批强化学习的安全HVAC控制
建筑占全球能源使用量的30%,其中约一半归因于暖通空调系统。强化学习(RL)在提高暖通空调系统能源效率方面改进了传统的控制方法。然而,之前的工作使用在线强化学习方法,需要配置复杂的热模拟器来训练或使用历史数据驱动的热模型,这些模型至少需要104个时间步才能达到基于规则的性能。此外,由于从模拟器到真实建筑的分布漂移,因此强化学习解决方案很少在现实世界中部署。另一方面,批处理强化学习方法可以从历史数据中学习并改进现有策略,而无需在训练过程中与真实建筑物或模拟器进行任何交互。以现有的基于规则的策略为先验,从部署的第一天开始,批量强化学习的策略就优于现有的控制,与在线方法相比,训练步骤很少。我们的算法结合了一个Kullback-Leibler (KL)正则化项来惩罚那些与之前的策略偏离很远的策略。我们在一个真实的多区域、多层建筑上评估了我们的框架,与最先进的批量RL方法相比,它实现了7.2%的节能,在乘员的热舒适方面优于其他BRL方法,与默认的基于规则的控制相比,节能16.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信