{"title":"Reinforcement learning aided smart-home decision-making in an interactive smart grid","authors":"Ding Li, S. Jayaweera","doi":"10.1109/IGESC.2014.7018632","DOIUrl":null,"url":null,"abstract":"In this paper, a complete hierarchical architecture is presented for the Utility-customer interaction, which tightly connect several important research topics, such as customer load prediction, renewable generation integration, power-load balancing and demand response. The complete interaction cycle consists of two stages: (1) Initial interaction (long-term planning) and (2) Real-time interaction (short-term planning). A hidden mode Markov decision process (HM-MDP) model is developed for customer real-time decision making, which outperforms the conventional Markov decision process (MDP) model in handling the non-stationary environment. To obtain a low-complexity, real-time algorithm, that allows to adaptively incorporate new observations as the environment changes, we resort to Q-learning based approximate dynamic programming (ADP). Without requiring specific starting and ending points of the scheduling period, the Q-learning algorithm offers more flexibility in practice. Performance analysis of both exact and approximate algorithms are presented with simulation results, in comparison with other decision making strategies.","PeriodicalId":372982,"journal":{"name":"2014 IEEE Green Energy and Systems Conference (IGESC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE Green Energy and Systems Conference (IGESC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGESC.2014.7018632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
In this paper, a complete hierarchical architecture is presented for the Utility-customer interaction, which tightly connect several important research topics, such as customer load prediction, renewable generation integration, power-load balancing and demand response. The complete interaction cycle consists of two stages: (1) Initial interaction (long-term planning) and (2) Real-time interaction (short-term planning). A hidden mode Markov decision process (HM-MDP) model is developed for customer real-time decision making, which outperforms the conventional Markov decision process (MDP) model in handling the non-stationary environment. To obtain a low-complexity, real-time algorithm, that allows to adaptively incorporate new observations as the environment changes, we resort to Q-learning based approximate dynamic programming (ADP). Without requiring specific starting and ending points of the scheduling period, the Q-learning algorithm offers more flexibility in practice. Performance analysis of both exact and approximate algorithms are presented with simulation results, in comparison with other decision making strategies.