{"title":"Automatic voltage control considering demand response: Approximatively completed observed Markov decision process-based reinforcement learning scheme","authors":"","doi":"10.1016/j.ijepes.2024.110156","DOIUrl":null,"url":null,"abstract":"<div><p>To fully utilize the voltage regulation capacity of flexible load and distributed generations (DGs), we propose a novel Approximatively Completed Observed Markov Decision Process-based (ACOMDP-based) Reinforcement Learning (RL) (namely, ACMRL) scheme for a multi-objective Automatic Voltage Control (AVC) problem considering Differential Increment Incentive Mechanism (DIIM)-based Incentive-Based Demand Response (IBDR). Firstly, we propose a DIIM to motivate high-flexibility consumers to achieve maximum potential in real-time voltage control while ensuring the best economy. Secondly, we characterize the multi-objective AVC problem as an ACOMDP model, transformed from the Partially Observable Markov Decision Process (POMDP) model, by introducing a novel hidden system state vector that incorporates the belief state, and the high confidence probability vector. The belief state and the high-confidence probability vector describe the probability distribution extracted from the historical observed state, portraying the precise state and the uncertainty existing in the state update process. Then, the ACOMDP block is inputted into the RL block, which adopts a modified underlying network architecture with the Asynchronous Advantage Actor-Critic (MA3C) algorithm embedded with the Shared Modular Policies(SMP) module. The MA3C-based RL block, characterized by enhanced communication efficiency, enables expedited generation of optimal decision-making actions even in the face of substantial uncertainty. Case studies are conducted in a practical district in Suzhou, China, and simulation results validate the superior performance of the proposed methodology.</p></div>","PeriodicalId":50326,"journal":{"name":"International Journal of Electrical Power & Energy Systems","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0142061524003776/pdfft?md5=17f417bcb3e5c2af69e8a843080dc780&pid=1-s2.0-S0142061524003776-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical Power & Energy Systems","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0142061524003776","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
To fully utilize the voltage regulation capacity of flexible load and distributed generations (DGs), we propose a novel Approximatively Completed Observed Markov Decision Process-based (ACOMDP-based) Reinforcement Learning (RL) (namely, ACMRL) scheme for a multi-objective Automatic Voltage Control (AVC) problem considering Differential Increment Incentive Mechanism (DIIM)-based Incentive-Based Demand Response (IBDR). Firstly, we propose a DIIM to motivate high-flexibility consumers to achieve maximum potential in real-time voltage control while ensuring the best economy. Secondly, we characterize the multi-objective AVC problem as an ACOMDP model, transformed from the Partially Observable Markov Decision Process (POMDP) model, by introducing a novel hidden system state vector that incorporates the belief state, and the high confidence probability vector. The belief state and the high-confidence probability vector describe the probability distribution extracted from the historical observed state, portraying the precise state and the uncertainty existing in the state update process. Then, the ACOMDP block is inputted into the RL block, which adopts a modified underlying network architecture with the Asynchronous Advantage Actor-Critic (MA3C) algorithm embedded with the Shared Modular Policies(SMP) module. The MA3C-based RL block, characterized by enhanced communication efficiency, enables expedited generation of optimal decision-making actions even in the face of substantial uncertainty. Case studies are conducted in a practical district in Suzhou, China, and simulation results validate the superior performance of the proposed methodology.
期刊介绍:
The journal covers theoretical developments in electrical power and energy systems and their applications. The coverage embraces: generation and network planning; reliability; long and short term operation; expert systems; neural networks; object oriented systems; system control centres; database and information systems; stock and parameter estimation; system security and adequacy; network theory, modelling and computation; small and large system dynamics; dynamic model identification; on-line control including load and switching control; protection; distribution systems; energy economics; impact of non-conventional systems; and man-machine interfaces.
As well as original research papers, the journal publishes short contributions, book reviews and conference reports. All papers are peer-reviewed by at least two referees.