Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay

Xiao-Yan Sun, Jinchao Chen, Chenglie Du, Mengying Zhan
{"title":"Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay","authors":"Xiao-Yan Sun, Jinchao Chen, Chenglie Du, Mengying Zhan","doi":"10.1109/IAEAC54830.2022.9929494","DOIUrl":null,"url":null,"abstract":"In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.","PeriodicalId":349113,"journal":{"name":"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC )","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAEAC54830.2022.9929494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, multi-agent reinforcement learning has been applied in many fields, such as urban traffic control, autonomous UAV operations, etc. Although the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm has been used in various simulation environments as a classic reinforcement algorithm, its training efficiency is low and the convergence speed is slow due to its original experience playback mechanism and network structure. The random experience replay mechanism adopted by the algorithm breaks the time series correlation between data samples. However, the experience replay mechanism does not take advantage of important samples. Therefore, the paper proposes a Multi-Agent Deep Deterministic Policy Gradient method based on classification experience replay, which modifies the traditional random experience replay into classification experience replay. Classified storage can make full use of important samples. At the same time, the Critic network and the Actor network are updated asynchronously, and the learned better Critic network is used to guide the Actor network update. Finally, to verify the effectiveness of the proposed algorithm, the improved algorithm is compared with the traditional MADDPG method in a simulation environment.
基于分类经验重播的多智能体深度确定性策略梯度算法
近年来,多智能体强化学习在城市交通控制、无人机自主操作等领域得到了广泛的应用。多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法作为一种经典的强化算法已经在各种仿真环境中得到应用,但由于其原有的经验回放机制和网络结构,其训练效率较低,收敛速度较慢。算法采用的随机经验重放机制打破了数据样本间的时间序列相关性。然而,经验回放机制并没有利用重要的样本。为此,本文提出了一种基于分类经验重播的多智能体深度确定性策略梯度方法,将传统的随机经验重播改进为分类经验重播。分类存放可以充分利用重要样品。同时,对评论家网络和演员网络进行异步更新,使用学习到的更好的评论家网络来指导演员网络的更新。最后,为了验证所提算法的有效性,在仿真环境下将改进算法与传统的MADDPG方法进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信