新交易能源机制中的多代理强化学习

IF 2.6 4区工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Iet Generation Transmission & Distribution Pub Date : 2024-08-22 DOI:10.1049/gtd2.13244

Hossein Mohsenzadeh-Yazdi, Hamed Kebriaei, Farrokh Aminifar

{"title":"新交易能源机制中的多代理强化学习","authors":"Hossein Mohsenzadeh-Yazdi, Hamed Kebriaei, Farrokh Aminifar","doi":"10.1049/gtd2.13244","DOIUrl":null,"url":null,"abstract":"<p>Thanks to reinforcement learning (RL), decision-making is more convenient and more economical in different situations with high uncertainty. In line with the same fact, it is proposed that prosumers can apply RL to earn more profit in the transactive energy market (TEM). In this article, an environment that represents a novel framework of TEM is designed, where all participants send their bids to this framework and receive their profit from it. Also, new state-action spaces are designed for sellers and buyers so that they can apply the Soft Actor-Critic (SAC) algorithm to converge to the best policy. A brief of this algorithm, which is for continuous state-action space, is described. First, this algorithm is implemented for a single agent (a seller and a buyer). Then we consider all players including sellers and buyers who can apply this algorithm as Multi-Agent. In this situation, there is a comprehensive game between participants that is investigated, and it is analyzed whether the players converge to the Nash equilibrium (NE) in this game. Finally, numerical results for the IEEE 33-bus distribution power system illustrate the effectiveness of the new framework for TEM, increasing sellers' and buyers' profits by applying SAC with the new state-action spaces. SAC is implemented as a Multi-Agent, demonstrating that players converge to a singular or one of the multiple NEs in this game. The results demonstrate that buyers converge to their optimal policies within 80 days, while sellers achieve optimality after 150 days in the games created between all participants.</p>","PeriodicalId":13261,"journal":{"name":"Iet Generation Transmission & Distribution","volume":"18 18","pages":"2943-2955"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/gtd2.13244","citationCount":"0","resultStr":"{\"title\":\"Multi-agent reinforcement learning in a new transactive energy mechanism\",\"authors\":\"Hossein Mohsenzadeh-Yazdi, Hamed Kebriaei, Farrokh Aminifar\",\"doi\":\"10.1049/gtd2.13244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Thanks to reinforcement learning (RL), decision-making is more convenient and more economical in different situations with high uncertainty. In line with the same fact, it is proposed that prosumers can apply RL to earn more profit in the transactive energy market (TEM). In this article, an environment that represents a novel framework of TEM is designed, where all participants send their bids to this framework and receive their profit from it. Also, new state-action spaces are designed for sellers and buyers so that they can apply the Soft Actor-Critic (SAC) algorithm to converge to the best policy. A brief of this algorithm, which is for continuous state-action space, is described. First, this algorithm is implemented for a single agent (a seller and a buyer). Then we consider all players including sellers and buyers who can apply this algorithm as Multi-Agent. In this situation, there is a comprehensive game between participants that is investigated, and it is analyzed whether the players converge to the Nash equilibrium (NE) in this game. Finally, numerical results for the IEEE 33-bus distribution power system illustrate the effectiveness of the new framework for TEM, increasing sellers' and buyers' profits by applying SAC with the new state-action spaces. SAC is implemented as a Multi-Agent, demonstrating that players converge to a singular or one of the multiple NEs in this game. The results demonstrate that buyers converge to their optimal policies within 80 days, while sellers achieve optimality after 150 days in the games created between all participants.</p>\",\"PeriodicalId\":13261,\"journal\":{\"name\":\"Iet Generation Transmission & Distribution\",\"volume\":\"18 18\",\"pages\":\"2943-2955\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/gtd2.13244\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Iet Generation Transmission & Distribution\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/gtd2.13244\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iet Generation Transmission & Distribution","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/gtd2.13244","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

得益于强化学习（RL），在高度不确定的不同情况下，决策变得更加方便和经济。基于同样的事实，有人提出，专业消费者可以应用强化学习在交易型能源市场（TEM）中赚取更多利润。本文设计了一个代表新型 TEM 框架的环境，所有参与者都向该框架发送竞价并从中获利。此外，还为卖方和买方设计了新的状态-行动空间，使他们可以应用软行为批判（SAC）算法收敛到最佳策略。本文简要介绍了这种适用于连续状态-行动空间的算法。首先，该算法是针对单个代理（卖方和买方）实施的。然后，我们把包括卖方和买方在内的所有可以应用该算法的参与者视为多代理。在这种情况下，我们研究了参与者之间的综合博弈，并分析了在此博弈中，参与者是否收敛到纳什均衡（NE）。最后，IEEE 33 总线配电系统的数值结果表明了新框架对 TEM 的有效性，通过应用具有新状态-行动空间的 SAC，增加了卖方和买方的利润。SAC 是作为多代理实现的，它证明了博弈者在此博弈中会趋同于一个单一的或多个近似值中的一个。结果表明，在所有参与者之间创建的博弈中，买方在 80 天内收敛到最优政策，而卖方在 150 天后达到最优。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Multi-agent reinforcement learning in a new transactive energy mechanism

查看原文本刊更多论文

Multi-agent reinforcement learning in a new transactive energy mechanism

Thanks to reinforcement learning (RL), decision-making is more convenient and more economical in different situations with high uncertainty. In line with the same fact, it is proposed that prosumers can apply RL to earn more profit in the transactive energy market (TEM). In this article, an environment that represents a novel framework of TEM is designed, where all participants send their bids to this framework and receive their profit from it. Also, new state-action spaces are designed for sellers and buyers so that they can apply the Soft Actor-Critic (SAC) algorithm to converge to the best policy. A brief of this algorithm, which is for continuous state-action space, is described. First, this algorithm is implemented for a single agent (a seller and a buyer). Then we consider all players including sellers and buyers who can apply this algorithm as Multi-Agent. In this situation, there is a comprehensive game between participants that is investigated, and it is analyzed whether the players converge to the Nash equilibrium (NE) in this game. Finally, numerical results for the IEEE 33-bus distribution power system illustrate the effectiveness of the new framework for TEM, increasing sellers' and buyers' profits by applying SAC with the new state-action spaces. SAC is implemented as a Multi-Agent, demonstrating that players converge to a singular or one of the multiple NEs in this game. The results demonstrate that buyers converge to their optimal policies within 80 days, while sellers achieve optimality after 150 days in the games created between all participants.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Iet Generation Transmission & Distribution 工程技术-工程：电子与电气

CiteScore

6.10

自引率

12.00%

发文量

301

审稿时长

5.4 months

期刊介绍： IET Generation, Transmission & Distribution is intended as a forum for the publication and discussion of current practice and future developments in electric power generation, transmission and distribution. Practical papers in which examples of good present practice can be described and disseminated are particularly sought. Papers of high technical merit relying on mathematical arguments and computation will be considered, but authors are asked to relegate, as far as possible, the details of analysis to an appendix. The scope of IET Generation, Transmission & Distribution includes the following: Design of transmission and distribution systems Operation and control of power generation Power system management, planning and economics Power system operation, protection and control Power system measurement and modelling Computer applications and computational intelligence in power flexible AC or DC transmission systems Special Issues. Current Call for papers: Next Generation of Synchrophasor-based Power System Monitoring, Operation and Control - https://digital-library.theiet.org/files/IET_GTD_CFP_NGSPSMOC.pdf