Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong
{"title":"基于规则的基于强化学习的楼宇控制策略规范化","authors":"Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong","doi":"10.1145/3575813.3595202","DOIUrl":null,"url":null,"abstract":"Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.","PeriodicalId":359352,"journal":{"name":"Proceedings of the 14th ACM International Conference on Future Energy Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rule-based Policy Regularization for Reinforcement Learning-based Building Control\",\"authors\":\"Hsin-Yu Liu, Bharathan Balaji, Rajesh K. Gupta, Dezhi Hong\",\"doi\":\"10.1145/3575813.3595202\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.\",\"PeriodicalId\":359352,\"journal\":{\"name\":\"Proceedings of the 14th ACM International Conference on Future Energy Systems\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 14th ACM International Conference on Future Energy Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575813.3595202\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th ACM International Conference on Future Energy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575813.3595202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Rule-based Policy Regularization for Reinforcement Learning-based Building Control
Rule-based control (RBC) is widely adopted in buildings due to its stability and robustness. It resembles a behavior cloning methodology refined by human experts; however, it is incapable of adapting to distribution drifts. Reinforcement learning (RL) can adapt to changes but needs to learn from scratch in the online setting. On the other hand, the learning ability is limited in offline settings due to extrapolation errors caused by selecting out-of-distribution actions. In this paper, we explore how to incorporate RL with a rule-based control policy to combine their strengths to continuously learn a scalable and robust policy in both online and offline settings. We start with representative online and offline RL methods, TD3 and TD3+BC, respectively. Then, we develop a dynamically weighted actor loss function to selectively choose which policy for RL models to learn from at each training iteration. With extensive experiments across various weather conditions in both deterministic and stochastic scenarios, we demonstrate that our algorithm, rule-based incorporated control regularization (RUBICON), outperforms state-of-the-art methods in offline settings by and improves the baseline method by in online settings with respect to a reward consisting of thermal comfort and energy consumption in building-RL environments.