Advanced day-ahead scheduling of HVAC demand response control using novel strategy of Q-learning, model predictive control, and input convex neural networks
IF 9.6 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"Advanced day-ahead scheduling of HVAC demand response control using novel strategy of Q-learning, model predictive control, and input convex neural networks","authors":"Rahman Heidarykiany, Cristinel Ababei","doi":"10.1016/j.egyai.2025.100509","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, we present a Q-Learning optimization algorithm for smart home HVAC systems. The proposed algorithm combines new convex deep neural network models with model predictive control (MPC) techniques. More specifically, new input convex long short-term memory (ICLSTM) models are employed to predict dynamic states in an MPC optimal control technique integrated within a Q-Learning reinforcement learning (RL) algorithm to further improve the learned temporal behaviors of nonlinear HVAC systems. As a novel RL approach, the proposed algorithm generates day-ahead HVAC demand response (DR) signals in smart homes that optimally reduce and/or shift peak energy usage, reduce electricity costs, minimize user discomfort, and honor in a best-effort way the recommendations from utility/aggregator, which in turn has impact on the overall well being of the distribution network controlled by the aggregator. The proposed Q-Learning optimization algorithm, based on epsilon-model predictive control (<span><math><mi>ϵ</mi></math></span>-MPC), can be implemented as a control agent that is executed by the smart house energy management (SHEM) system that we assume exists in the smart home, which can interact with the energy provider of the distribution network, i.e., utility/aggregator, via the smart meter. The output generated by the proposed control agent represents day-ahead local DR signals in the form of temperature setpoints for the HVAC system that are found by the optimization process to lead to desired trade-offs between electricity cost and user discomfort. The proposed algorithm can be used in smart homes with passive HVAC controllers, which solely react to end-user setpoints, to transform them into smart homes with active HVAC controllers. Such systems not only respond to the preferences of the end-user but also incorporate an external control signal provided by the utility or aggregator. Simulation experiments conducted with a custom simulation tool demonstrate that the proposed optimization framework can offer significant benefits. It achieves 87% higher success rate in optimizing setpoints in the desired range, thereby resulting in up to 15% energy savings and zero temperature discomfort.</div></div>","PeriodicalId":34138,"journal":{"name":"Energy and AI","volume":"20 ","pages":"Article 100509"},"PeriodicalIF":9.6000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and AI","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666546825000412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we present a Q-Learning optimization algorithm for smart home HVAC systems. The proposed algorithm combines new convex deep neural network models with model predictive control (MPC) techniques. More specifically, new input convex long short-term memory (ICLSTM) models are employed to predict dynamic states in an MPC optimal control technique integrated within a Q-Learning reinforcement learning (RL) algorithm to further improve the learned temporal behaviors of nonlinear HVAC systems. As a novel RL approach, the proposed algorithm generates day-ahead HVAC demand response (DR) signals in smart homes that optimally reduce and/or shift peak energy usage, reduce electricity costs, minimize user discomfort, and honor in a best-effort way the recommendations from utility/aggregator, which in turn has impact on the overall well being of the distribution network controlled by the aggregator. The proposed Q-Learning optimization algorithm, based on epsilon-model predictive control (-MPC), can be implemented as a control agent that is executed by the smart house energy management (SHEM) system that we assume exists in the smart home, which can interact with the energy provider of the distribution network, i.e., utility/aggregator, via the smart meter. The output generated by the proposed control agent represents day-ahead local DR signals in the form of temperature setpoints for the HVAC system that are found by the optimization process to lead to desired trade-offs between electricity cost and user discomfort. The proposed algorithm can be used in smart homes with passive HVAC controllers, which solely react to end-user setpoints, to transform them into smart homes with active HVAC controllers. Such systems not only respond to the preferences of the end-user but also incorporate an external control signal provided by the utility or aggregator. Simulation experiments conducted with a custom simulation tool demonstrate that the proposed optimization framework can offer significant benefits. It achieves 87% higher success rate in optimizing setpoints in the desired range, thereby resulting in up to 15% energy savings and zero temperature discomfort.