交叉熵法增强双延迟深度确定性策略梯度

2021 8th NAFOSTED Conference on Information and Computer Science (NICS) Pub Date : 2021-12-21 DOI:10.1109/NICS54270.2021.9701549

Hieu Trung Nguyen, Khang Tran, N. H. Luong

{"title":"交叉熵法增强双延迟深度确定性策略梯度","authors":"Hieu Trung Nguyen, Khang Tran, N. H. Luong","doi":"10.1109/NICS54270.2021.9701549","DOIUrl":null,"url":null,"abstract":"Hybridizations of Deep Reinforcement Learning (DRL) and Evolution Computation (EC) methods have recently showed considerable successes in a variety of high dimensional physical control tasks. These hybrid frameworks offer more robust mechanisms of exploration and exploitation in the policy network parameter search space when stabilizing gradient-based updates of DRL algorithms with population-based operations adopted from EC methods. In this paper, we propose a novel hybrid framework that effectively combines the efficiency of DRL updates and the stability of EC populations. We experiment with integrating the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Cross-Entropy Method (CEM). The resulting EC-enhanced TD3 algorithm (eTD3) are compared with the baseline algorithm TD3 and a state-of-the-art evolutionary reinforcement learning (ERL) method, CEM-TD3. Experimental results on five MuJoCo continuous control benchmark environments confirm the efficacy of our approach. The source code of the paper is available at https://github.com/ELO-Lab/eTD3.","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method\",\"authors\":\"Hieu Trung Nguyen, Khang Tran, N. H. Luong\",\"doi\":\"10.1109/NICS54270.2021.9701549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hybridizations of Deep Reinforcement Learning (DRL) and Evolution Computation (EC) methods have recently showed considerable successes in a variety of high dimensional physical control tasks. These hybrid frameworks offer more robust mechanisms of exploration and exploitation in the policy network parameter search space when stabilizing gradient-based updates of DRL algorithms with population-based operations adopted from EC methods. In this paper, we propose a novel hybrid framework that effectively combines the efficiency of DRL updates and the stability of EC populations. We experiment with integrating the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Cross-Entropy Method (CEM). The resulting EC-enhanced TD3 algorithm (eTD3) are compared with the baseline algorithm TD3 and a state-of-the-art evolutionary reinforcement learning (ERL) method, CEM-TD3. Experimental results on five MuJoCo continuous control benchmark environments confirm the efficacy of our approach. The source code of the paper is available at https://github.com/ELO-Lab/eTD3.\",\"PeriodicalId\":296963,\"journal\":{\"name\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS54270.2021.9701549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

深度强化学习(DRL)和进化计算(EC)方法的杂交最近在各种高维物理控制任务中取得了相当大的成功。这些混合框架在稳定基于梯度的DRL算法更新和基于种群的操作时，为策略网络参数搜索空间提供了更强大的探索和利用机制。在本文中，我们提出了一个新的混合框架，有效地结合了DRL更新的效率和EC种群的稳定性。我们尝试将双延迟深度确定性策略梯度(TD3)和交叉熵方法(CEM)相结合。将得到的ec增强TD3算法(eTD3)与基线算法TD3和最先进的进化强化学习(ERL)方法CEM-TD3进行比较。在五种MuJoCo连续控制基准环境下的实验结果证实了该方法的有效性。该论文的源代码可在https://github.com/ELO-Lab/eTD3上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing Twin Delayed Deep Deterministic Policy Gradient with Cross-Entropy Method

Hybridizations of Deep Reinforcement Learning (DRL) and Evolution Computation (EC) methods have recently showed considerable successes in a variety of high dimensional physical control tasks. These hybrid frameworks offer more robust mechanisms of exploration and exploitation in the policy network parameter search space when stabilizing gradient-based updates of DRL algorithms with population-based operations adopted from EC methods. In this paper, we propose a novel hybrid framework that effectively combines the efficiency of DRL updates and the stability of EC populations. We experiment with integrating the Twin Delayed Deep Deterministic Policy Gradient (TD3) and the Cross-Entropy Method (CEM). The resulting EC-enhanced TD3 algorithm (eTD3) are compared with the baseline algorithm TD3 and a state-of-the-art evolutionary reinforcement learning (ERL) method, CEM-TD3. Experimental results on five MuJoCo continuous control benchmark environments confirm the efficacy of our approach. The source code of the paper is available at https://github.com/ELO-Lab/eTD3.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 8th NAFOSTED Conference on Information and Computer Science (NICS)

自引率

0.00%

发文量