Hsin-Yu Liu, Bharathan Balaji, Sicun Gao, Rajesh K. Gupta, Dezhi Hong
{"title":"Safe HVAC Control via Batch Reinforcement Learning","authors":"Hsin-Yu Liu, Bharathan Balaji, Sicun Gao, Rajesh K. Gupta, Dezhi Hong","doi":"10.1109/iccps54341.2022.00023","DOIUrl":null,"url":null,"abstract":"Buildings account for 30% of energy use worldwide, and approxi-mately half of it is ascribed to HVAC systems. Reinforcement Learning (RL) has improved upon traditional control methods in increasing the energy efficiency of HVAC systems. However, prior works use online RL methods that require configuring complex thermal simulators to train or use historical data-driven thermal models that can take at least 104 time steps to reach rule-based performance Also, due to the distribution drift from simulator to real buildings, RL solutions are therefore seldom deployed in the real world. On the other hand, batch RL methods can learn from the historical data and improve upon the existing policy without any interactions with the real buildings or simulators during the training. With the existing rule-based policy as the priors, the policies learned with batch RL are better than the existing control from the first day of deployment with very few training steps compared with online methods. Our algorithm incorporates a Kullback-Leibler (KL) regularization term to penalize policies that deviate far from the previous ones. We evaluate our framework on a real multi-zone, multi-floor building-it achieves 7.2% in energy reduction cf. the state-of-the-art batch RL method, and outperforms other BRL methods in occu-pants' thermal comfort, and 16.7% energy reduction compared to the default rule-based control.","PeriodicalId":340078,"journal":{"name":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccps54341.2022.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Buildings account for 30% of energy use worldwide, and approxi-mately half of it is ascribed to HVAC systems. Reinforcement Learning (RL) has improved upon traditional control methods in increasing the energy efficiency of HVAC systems. However, prior works use online RL methods that require configuring complex thermal simulators to train or use historical data-driven thermal models that can take at least 104 time steps to reach rule-based performance Also, due to the distribution drift from simulator to real buildings, RL solutions are therefore seldom deployed in the real world. On the other hand, batch RL methods can learn from the historical data and improve upon the existing policy without any interactions with the real buildings or simulators during the training. With the existing rule-based policy as the priors, the policies learned with batch RL are better than the existing control from the first day of deployment with very few training steps compared with online methods. Our algorithm incorporates a Kullback-Leibler (KL) regularization term to penalize policies that deviate far from the previous ones. We evaluate our framework on a real multi-zone, multi-floor building-it achieves 7.2% in energy reduction cf. the state-of-the-art batch RL method, and outperforms other BRL methods in occu-pants' thermal comfort, and 16.7% energy reduction compared to the default rule-based control.