基于预训练和专家知识的深度强化学习的日内调度方法

IF 5 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Yanbo Chen , Qintao Du , Huayu Dong , Tao Huang , Jiahao Ma , Zitao Xu , Zhihao Wang
{"title":"基于预训练和专家知识的深度强化学习的日内调度方法","authors":"Yanbo Chen ,&nbsp;Qintao Du ,&nbsp;Huayu Dong ,&nbsp;Tao Huang ,&nbsp;Jiahao Ma ,&nbsp;Zitao Xu ,&nbsp;Zhihao Wang","doi":"10.1016/j.ijepes.2025.110719","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional economic dispatch algorithms rely on the accuracy of all parameters and also lack the adaptability to the high uncertainties brought by the dynamic changes happening in the current power systems. Its computing efficiency also needs to be improved with the increased operational complexities. In recent years, due to high self-learning and self-optimization ability, reinforcement learning has emerged in the field of economic dispatch, which can solve model-free dynamic programming problems that cannot be effectively solved by traditional optimization methods. In this paper, we construct a reinforcement agent for intra-day dispatch to optimize generator output, using a twin delayed deep deterministic policy gradient algorithm based on pre-training and expert knowledge (PEK-TD3). Aiming at solving the problems of long exploration time and poor convergence of conventional deep reinforcement learning, we propose an initial policy network training method based on pre-training with supervised learning, which significantly speeds up the training process of deep reinforcement learning and greatly reduces the model development cycle. At the same time, expert knowledge is embedded in the deep reinforcement learning to guide the training of the agent. With the guidance of expert knowledge, on the one hand, the agent quickly learns to limit the search direction to the feasible region of the power system operation so as to improve the convergence. On the other hand, in order to obtain higher rewards, agent learns to prioritize the renewable energy utilization which significantly reduces the curtailment rate of renewable energy. Finally, the modify IEEE 118-node system is used to verify the performance of the proposed method.</div></div>","PeriodicalId":50326,"journal":{"name":"International Journal of Electrical Power & Energy Systems","volume":"169 ","pages":"Article 110719"},"PeriodicalIF":5.0000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Intra-day dispatch method via deep reinforcement learning based on pre-training and expert knowledge\",\"authors\":\"Yanbo Chen ,&nbsp;Qintao Du ,&nbsp;Huayu Dong ,&nbsp;Tao Huang ,&nbsp;Jiahao Ma ,&nbsp;Zitao Xu ,&nbsp;Zhihao Wang\",\"doi\":\"10.1016/j.ijepes.2025.110719\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traditional economic dispatch algorithms rely on the accuracy of all parameters and also lack the adaptability to the high uncertainties brought by the dynamic changes happening in the current power systems. Its computing efficiency also needs to be improved with the increased operational complexities. In recent years, due to high self-learning and self-optimization ability, reinforcement learning has emerged in the field of economic dispatch, which can solve model-free dynamic programming problems that cannot be effectively solved by traditional optimization methods. In this paper, we construct a reinforcement agent for intra-day dispatch to optimize generator output, using a twin delayed deep deterministic policy gradient algorithm based on pre-training and expert knowledge (PEK-TD3). Aiming at solving the problems of long exploration time and poor convergence of conventional deep reinforcement learning, we propose an initial policy network training method based on pre-training with supervised learning, which significantly speeds up the training process of deep reinforcement learning and greatly reduces the model development cycle. At the same time, expert knowledge is embedded in the deep reinforcement learning to guide the training of the agent. With the guidance of expert knowledge, on the one hand, the agent quickly learns to limit the search direction to the feasible region of the power system operation so as to improve the convergence. On the other hand, in order to obtain higher rewards, agent learns to prioritize the renewable energy utilization which significantly reduces the curtailment rate of renewable energy. Finally, the modify IEEE 118-node system is used to verify the performance of the proposed method.</div></div>\",\"PeriodicalId\":50326,\"journal\":{\"name\":\"International Journal of Electrical Power & Energy Systems\",\"volume\":\"169 \",\"pages\":\"Article 110719\"},\"PeriodicalIF\":5.0000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Electrical Power & Energy Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0142061525002704\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical Power & Energy Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0142061525002704","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

传统的经济调度算法依赖于各参数的准确性,缺乏对当前电力系统动态变化带来的高度不确定性的适应能力。随着操作复杂性的增加,其计算效率也需要提高。近年来,由于具有较高的自学习和自优化能力,强化学习在经济调度领域出现,可以解决传统优化方法无法有效解决的无模型动态规划问题。本文采用基于预训练和专家知识的双延迟深度确定性策略梯度算法(PEK-TD3),构建了一个用于日间调度的强化代理,以优化发电机输出。针对传统深度强化学习探索时间长、收敛性差的问题,提出了一种基于监督学习预训练的初始策略网络训练方法,显著加快了深度强化学习的训练过程,大大缩短了模型开发周期。同时,在深度强化学习中嵌入专家知识,指导智能体的训练。在专家知识的指导下,智能体一方面快速学会将搜索方向限制在电力系统运行的可行区域内,从而提高收敛性;另一方面,为了获得更高的奖励,agent学习对可再生能源的利用进行优先排序,显著降低了可再生能源的弃风率。最后,利用改进的IEEE 118节点系统验证了所提方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Intra-day dispatch method via deep reinforcement learning based on pre-training and expert knowledge
Traditional economic dispatch algorithms rely on the accuracy of all parameters and also lack the adaptability to the high uncertainties brought by the dynamic changes happening in the current power systems. Its computing efficiency also needs to be improved with the increased operational complexities. In recent years, due to high self-learning and self-optimization ability, reinforcement learning has emerged in the field of economic dispatch, which can solve model-free dynamic programming problems that cannot be effectively solved by traditional optimization methods. In this paper, we construct a reinforcement agent for intra-day dispatch to optimize generator output, using a twin delayed deep deterministic policy gradient algorithm based on pre-training and expert knowledge (PEK-TD3). Aiming at solving the problems of long exploration time and poor convergence of conventional deep reinforcement learning, we propose an initial policy network training method based on pre-training with supervised learning, which significantly speeds up the training process of deep reinforcement learning and greatly reduces the model development cycle. At the same time, expert knowledge is embedded in the deep reinforcement learning to guide the training of the agent. With the guidance of expert knowledge, on the one hand, the agent quickly learns to limit the search direction to the feasible region of the power system operation so as to improve the convergence. On the other hand, in order to obtain higher rewards, agent learns to prioritize the renewable energy utilization which significantly reduces the curtailment rate of renewable energy. Finally, the modify IEEE 118-node system is used to verify the performance of the proposed method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Electrical Power & Energy Systems
International Journal of Electrical Power & Energy Systems 工程技术-工程:电子与电气
CiteScore
12.10
自引率
17.30%
发文量
1022
审稿时长
51 days
期刊介绍: The journal covers theoretical developments in electrical power and energy systems and their applications. The coverage embraces: generation and network planning; reliability; long and short term operation; expert systems; neural networks; object oriented systems; system control centres; database and information systems; stock and parameter estimation; system security and adequacy; network theory, modelling and computation; small and large system dynamics; dynamic model identification; on-line control including load and switching control; protection; distribution systems; energy economics; impact of non-conventional systems; and man-machine interfaces. As well as original research papers, the journal publishes short contributions, book reviews and conference reports. All papers are peer-reviewed by at least two referees.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信