基于规则的基于强化学习的楼宇控制策略规范化

Proceedings of the 14th ACM International Conference on Future Energy Systems Pub Date : 2023-06-16 DOI:10.1145/3575813.3595202

Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong

{"title":"基于规则的基于强化学习的楼宇控制策略规范化","authors":"Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong","doi":"10.1145/3575813.3595202","DOIUrl":null,"url":null,"abstract":"Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.","PeriodicalId":359352,"journal":{"name":"Proceedings of the 14th ACM International Conference on Future Energy Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rule-based Policy Regularization for Reinforcement Learning-based Building Control\",\"authors\":\"Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong\",\"doi\":\"10.1145/3575813.3595202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.\",\"PeriodicalId\":359352,\"journal\":{\"name\":\"Proceedings of the 14th ACM International Conference on Future Energy Systems\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM International Conference on Future Energy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575813.3595202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM International Conference on Future Energy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575813.3595202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于规则的控制以其稳定性和鲁棒性被广泛应用于建筑中。它类似于人类专家改进的行为克隆方法;然而，它不能适应分布漂移。强化学习(RL)可以适应变化，但需要在在线环境中从头开始学习。另一方面，由于选择分布外的动作会产生外推误差，因此在离线环境下的学习能力受到限制。在本文中，我们探讨了如何将强化学习与基于规则的控制策略结合起来，以结合它们的优势，在在线和离线设置中不断学习可扩展和健壮的策略。我们从具有代表性的在线和离线RL方法TD3和TD3+BC开始。然后，我们开发了一个动态加权的actor损失函数，以在每次训练迭代中选择性地选择RL模型学习的策略。通过在确定性和随机情况下的各种天气条件下进行的广泛实验，我们证明了我们的算法，基于规则的合并控制正则化(RUBICON)，在离线设置中优于最先进的方法，并在在线设置中改进了基线方法，包括建筑rl环境中的热舒适和能耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rule-based Policy Regularization for Reinforcement Learning-based Building Control

Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 14th ACM International Conference on Future Energy Systems

自引率

0.00%

发文量