Yanbo Chen , Qintao Du , Huayu Dong , Tao Huang , Jiahao Ma , Zitao Xu , Zhihao Wang
{"title":"基于预训练和专家知识的深度强化学习的日内调度方法","authors":"Yanbo Chen , Qintao Du , Huayu Dong , Tao Huang , Jiahao Ma , Zitao Xu , Zhihao Wang","doi":"10.1016/j.ijepes.2025.110719","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional economic dispatch algorithms rely on the accuracy of all parameters and also lack the adaptability to the high uncertainties brought by the dynamic changes happening in the current power systems. Its computing efficiency also needs to be improved with the increased operational complexities. In recent years, due to high self-learning and self-optimization ability, reinforcement learning has emerged in the field of economic dispatch, which can solve model-free dynamic programming problems that cannot be effectively solved by traditional optimization methods. In this paper, we construct a reinforcement agent for intra-day dispatch to optimize generator output, using a twin delayed deep deterministic policy gradient algorithm based on pre-training and expert knowledge (PEK-TD3). Aiming at solving the problems of long exploration time and poor convergence of conventional deep reinforcement learning, we propose an initial policy network training method based on pre-training with supervised learning, which significantly speeds up the training process of deep reinforcement learning and greatly reduces the model development cycle. At the same time, expert knowledge is embedded in the deep reinforcement learning to guide the training of the agent. With the guidance of expert knowledge, on the one hand, the agent quickly learns to limit the search direction to the feasible region of the power system operation so as to improve the convergence. On the other hand, in order to obtain higher rewards, agent learns to prioritize the renewable energy utilization which significantly reduces the curtailment rate of renewable energy. Finally, the modify IEEE 118-node system is used to verify the performance of the proposed method.</div></div>","PeriodicalId":50326,"journal":{"name":"International Journal of Electrical Power & Energy Systems","volume":"169 ","pages":"Article 110719"},"PeriodicalIF":5.0000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intra-day dispatch method via deep reinforcement learning based on pre-training and expert knowledge\",\"authors\":\"Yanbo Chen , Qintao Du , Huayu Dong , Tao Huang , Jiahao Ma , Zitao Xu , Zhihao Wang\",\"doi\":\"10.1016/j.ijepes.2025.110719\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional economic dispatch algorithms rely on the accuracy of all parameters and also lack the adaptability to the high uncertainties brought by the dynamic changes happening in the current power systems. Its computing efficiency also needs to be improved with the increased operational complexities. In recent years, due to high self-learning and self-optimization ability, reinforcement learning has emerged in the field of economic dispatch, which can solve model-free dynamic programming problems that cannot be effectively solved by traditional optimization methods. In this paper, we construct a reinforcement agent for intra-day dispatch to optimize generator output, using a twin delayed deep deterministic policy gradient algorithm based on pre-training and expert knowledge (PEK-TD3). Aiming at solving the problems of long exploration time and poor convergence of conventional deep reinforcement learning, we propose an initial policy network training method based on pre-training with supervised learning, which significantly speeds up the training process of deep reinforcement learning and greatly reduces the model development cycle. At the same time, expert knowledge is embedded in the deep reinforcement learning to guide the training of the agent. With the guidance of expert knowledge, on the one hand, the agent quickly learns to limit the search direction to the feasible region of the power system operation so as to improve the convergence. On the other hand, in order to obtain higher rewards, agent learns to prioritize the renewable energy utilization which significantly reduces the curtailment rate of renewable energy. Finally, the modify IEEE 118-node system is used to verify the performance of the proposed method.</div></div>\",\"PeriodicalId\":50326,\"journal\":{\"name\":\"International Journal of Electrical Power & Energy Systems\",\"volume\":\"169 \",\"pages\":\"Article 110719\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Electrical Power & Energy Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0142061525002704\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical Power & Energy Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0142061525002704","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Intra-day dispatch method via deep reinforcement learning based on pre-training and expert knowledge
Traditional economic dispatch algorithms rely on the accuracy of all parameters and also lack the adaptability to the high uncertainties brought by the dynamic changes happening in the current power systems. Its computing efficiency also needs to be improved with the increased operational complexities. In recent years, due to high self-learning and self-optimization ability, reinforcement learning has emerged in the field of economic dispatch, which can solve model-free dynamic programming problems that cannot be effectively solved by traditional optimization methods. In this paper, we construct a reinforcement agent for intra-day dispatch to optimize generator output, using a twin delayed deep deterministic policy gradient algorithm based on pre-training and expert knowledge (PEK-TD3). Aiming at solving the problems of long exploration time and poor convergence of conventional deep reinforcement learning, we propose an initial policy network training method based on pre-training with supervised learning, which significantly speeds up the training process of deep reinforcement learning and greatly reduces the model development cycle. At the same time, expert knowledge is embedded in the deep reinforcement learning to guide the training of the agent. With the guidance of expert knowledge, on the one hand, the agent quickly learns to limit the search direction to the feasible region of the power system operation so as to improve the convergence. On the other hand, in order to obtain higher rewards, agent learns to prioritize the renewable energy utilization which significantly reduces the curtailment rate of renewable energy. Finally, the modify IEEE 118-node system is used to verify the performance of the proposed method.
期刊介绍:
The journal covers theoretical developments in electrical power and energy systems and their applications. The coverage embraces: generation and network planning; reliability; long and short term operation; expert systems; neural networks; object oriented systems; system control centres; database and information systems; stock and parameter estimation; system security and adequacy; network theory, modelling and computation; small and large system dynamics; dynamic model identification; on-line control including load and switching control; protection; distribution systems; energy economics; impact of non-conventional systems; and man-machine interfaces.
As well as original research papers, the journal publishes short contributions, book reviews and conference reports. All papers are peer-reviewed by at least two referees.