基于规则的基于强化学习的楼宇控制策略规范化

Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong
{"title":"基于规则的基于强化学习的楼宇控制策略规范化","authors":"Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong","doi":"10.1145/3575813.3595202","DOIUrl":null,"url":null,"abstract":"Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.","PeriodicalId":359352,"journal":{"name":"Proceedings of the 14th ACM International Conference on Future Energy Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rule-based Policy Regularization for Reinforcement Learning-based Building Control\",\"authors\":\"Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong\",\"doi\":\"10.1145/3575813.3595202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.\",\"PeriodicalId\":359352,\"journal\":{\"name\":\"Proceedings of the 14th ACM International Conference on Future Energy Systems\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM International Conference on Future Energy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575813.3595202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM International Conference on Future Energy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575813.3595202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于规则的控制以其稳定性和鲁棒性被广泛应用于建筑中。它类似于人类专家改进的行为克隆方法;然而,它不能适应分布漂移。强化学习(RL)可以适应变化,但需要在在线环境中从头开始学习。另一方面,由于选择分布外的动作会产生外推误差,因此在离线环境下的学习能力受到限制。在本文中,我们探讨了如何将强化学习与基于规则的控制策略结合起来,以结合它们的优势,在在线和离线设置中不断学习可扩展和健壮的策略。我们从具有代表性的在线和离线RL方法TD3和TD3+BC开始。然后,我们开发了一个动态加权的actor损失函数,以在每次训练迭代中选择性地选择RL模型学习的策略。通过在确定性和随机情况下的各种天气条件下进行的广泛实验,我们证明了我们的算法,基于规则的合并控制正则化(RUBICON),在离线设置中优于最先进的方法,并在在线设置中改进了基线方法,包括建筑rl环境中的热舒适和能耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Rule-based Policy Regularization for Reinforcement Learning-based Building Control
Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信