{"title":"基于价格的需求响应方案设计的离线强化学习","authors":"Ce Xu, Bo Liu, Yue Zhao","doi":"10.1109/CISS56502.2023.10089681","DOIUrl":null,"url":null,"abstract":"In this paper, price-based demand response (DR) program design by offline Reinforcement Learning (RL) with data collected from smart meters is studied. Unlike online RL approaches, offline RL does not need to interact with consumers in the real world and thus has great cost-effectiveness and safety advantages. A sequential decision-making process with a Markov Decision Process (MDP) framework is formulated. A novel data augmentation method based on bootstrapping is developed. Deep Q-network (DQN)-based offline RL and policy evaluation algorithms are developed to design high-performance DR pricing policies. The developed offline learning methods are evaluated on both a real-world data set and simulation environments. It is demonstrated that the performance of the developed offline RL methods achieve excellent performance that is very close to the ideal performance bound provided by the state-of-the-art online RL algorithms.","PeriodicalId":243775,"journal":{"name":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Offline Reinforcement Learning for Price-Based Demand Response Program Design\",\"authors\":\"Ce Xu, Bo Liu, Yue Zhao\",\"doi\":\"10.1109/CISS56502.2023.10089681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, price-based demand response (DR) program design by offline Reinforcement Learning (RL) with data collected from smart meters is studied. Unlike online RL approaches, offline RL does not need to interact with consumers in the real world and thus has great cost-effectiveness and safety advantages. A sequential decision-making process with a Markov Decision Process (MDP) framework is formulated. A novel data augmentation method based on bootstrapping is developed. Deep Q-network (DQN)-based offline RL and policy evaluation algorithms are developed to design high-performance DR pricing policies. The developed offline learning methods are evaluated on both a real-world data set and simulation environments. It is demonstrated that the performance of the developed offline RL methods achieve excellent performance that is very close to the ideal performance bound provided by the state-of-the-art online RL algorithms.\",\"PeriodicalId\":243775,\"journal\":{\"name\":\"2023 57th Annual Conference on Information Sciences and Systems (CISS)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 57th Annual Conference on Information Sciences and Systems (CISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISS56502.2023.10089681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS56502.2023.10089681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Offline Reinforcement Learning for Price-Based Demand Response Program Design
In this paper, price-based demand response (DR) program design by offline Reinforcement Learning (RL) with data collected from smart meters is studied. Unlike online RL approaches, offline RL does not need to interact with consumers in the real world and thus has great cost-effectiveness and safety advantages. A sequential decision-making process with a Markov Decision Process (MDP) framework is formulated. A novel data augmentation method based on bootstrapping is developed. Deep Q-network (DQN)-based offline RL and policy evaluation algorithms are developed to design high-performance DR pricing policies. The developed offline learning methods are evaluated on both a real-world data set and simulation environments. It is demonstrated that the performance of the developed offline RL methods achieve excellent performance that is very close to the ideal performance bound provided by the state-of-the-art online RL algorithms.