Multi-agent reinforcement learning in a new transactive energy mechanism

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS

ACS Applied Bio Materials Pub Date : 2024-08-22 DOI:10.1049/gtd2.13244

Hossein Mohsenzadeh-Yazdi, Hamed Kebriaei, Farrokh Aminifar

{"title":"Multi-agent reinforcement learning in a new transactive energy mechanism","authors":"Hossein Mohsenzadeh-Yazdi, Hamed Kebriaei, Farrokh Aminifar","doi":"10.1049/gtd2.13244","DOIUrl":null,"url":null,"abstract":"<p>Thanks to reinforcement learning (RL), decision-making is more convenient and more economical in different situations with high uncertainty. In line with the same fact, it is proposed that prosumers can apply RL to earn more profit in the transactive energy market (TEM). In this article, an environment that represents a novel framework of TEM is designed, where all participants send their bids to this framework and receive their profit from it. Also, new state-action spaces are designed for sellers and buyers so that they can apply the Soft Actor-Critic (SAC) algorithm to converge to the best policy. A brief of this algorithm, which is for continuous state-action space, is described. First, this algorithm is implemented for a single agent (a seller and a buyer). Then we consider all players including sellers and buyers who can apply this algorithm as Multi-Agent. In this situation, there is a comprehensive game between participants that is investigated, and it is analyzed whether the players converge to the Nash equilibrium (NE) in this game. Finally, numerical results for the IEEE 33-bus distribution power system illustrate the effectiveness of the new framework for TEM, increasing sellers' and buyers' profits by applying SAC with the new state-action spaces. SAC is implemented as a Multi-Agent, demonstrating that players converge to a singular or one of the multiple NEs in this game. The results demonstrate that buyers converge to their optimal policies within 80 days, while sellers achieve optimality after 150 days in the games created between all participants.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/gtd2.13244","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/gtd2.13244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}

引用次数: 0

Abstract

Thanks to reinforcement learning (RL), decision-making is more convenient and more economical in different situations with high uncertainty. In line with the same fact, it is proposed that prosumers can apply RL to earn more profit in the transactive energy market (TEM). In this article, an environment that represents a novel framework of TEM is designed, where all participants send their bids to this framework and receive their profit from it. Also, new state-action spaces are designed for sellers and buyers so that they can apply the Soft Actor-Critic (SAC) algorithm to converge to the best policy. A brief of this algorithm, which is for continuous state-action space, is described. First, this algorithm is implemented for a single agent (a seller and a buyer). Then we consider all players including sellers and buyers who can apply this algorithm as Multi-Agent. In this situation, there is a comprehensive game between participants that is investigated, and it is analyzed whether the players converge to the Nash equilibrium (NE) in this game. Finally, numerical results for the IEEE 33-bus distribution power system illustrate the effectiveness of the new framework for TEM, increasing sellers' and buyers' profits by applying SAC with the new state-action spaces. SAC is implemented as a Multi-Agent, demonstrating that players converge to a singular or one of the multiple NEs in this game. The results demonstrate that buyers converge to their optimal policies within 80 days, while sellers achieve optimality after 150 days in the games created between all participants.

Abstract Image

查看原文本刊更多论文

新交易能源机制中的多代理强化学习

得益于强化学习（RL），在高度不确定的不同情况下，决策变得更加方便和经济。基于同样的事实，有人提出，专业消费者可以应用强化学习在交易型能源市场（TEM）中赚取更多利润。本文设计了一个代表新型 TEM 框架的环境，所有参与者都向该框架发送竞价并从中获利。此外，还为卖方和买方设计了新的状态-行动空间，使他们可以应用软行为批判（SAC）算法收敛到最佳策略。本文简要介绍了这种适用于连续状态-行动空间的算法。首先，该算法是针对单个代理（卖方和买方）实施的。然后，我们把包括卖方和买方在内的所有可以应用该算法的参与者视为多代理。在这种情况下，我们研究了参与者之间的综合博弈，并分析了在此博弈中，参与者是否收敛到纳什均衡（NE）。最后，IEEE 33 总线配电系统的数值结果表明了新框架对 TEM 的有效性，通过应用具有新状态-行动空间的 SAC，增加了卖方和买方的利润。SAC 是作为多代理实现的，它证明了博弈者在此博弈中会趋同于一个单一的或多个近似值中的一个。结果表明，在所有参与者之间创建的博弈中，买方在 80 天内收敛到最优政策，而卖方在 150 天后达到最优。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACS Applied Bio Materials Chemistry-Chemistry (all)

CiteScore

9.40

自引率

2.10%

发文量

464