Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC ) Pub Date : 2022-10-03 DOI:10.1109/IAEAC54830.2022.9929494

Xiao-Yan Sun, Jinchao Chen, Chenglie Du, Mengying Zhan

{"title":"Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay","authors":"Xiao-Yan Sun, Jinchao Chen, Chenglie Du, Mengying Zhan","doi":"10.1109/IAEAC54830.2022.9929494","DOIUrl":null,"url":null,"abstract":"In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.","PeriodicalId":349113,"journal":{"name":"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAEAC54830.2022.9929494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.

查看原文本刊更多论文

基于分类经验重播的多智能体深度确定性策略梯度算法

近年来，多智能体强化学习在城市交通控制、无人机自主操作等领域得到了广泛的应用。多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法作为一种经典的强化算法已经在各种仿真环境中得到应用，但由于其原有的经验回放机制和网络结构，其训练效率较低，收敛速度较慢。算法采用的随机经验重放机制打破了数据样本间的时间序列相关性。然而，经验回放机制并没有利用重要的样本。为此，本文提出了一种基于分类经验重播的多智能体深度确定性策略梯度方法，将传统的随机经验重播改进为分类经验重播。分类存放可以充分利用重要样品。同时，对评论家网络和演员网络进行异步更新，使用学习到的更好的评论家网络来指导演员网络的更新。最后，为了验证所提算法的有效性，在仿真环境下将改进算法与传统的MADDPG方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )

自引率

0.00%

发文量